AI Systems / Analytics Workflow
DataBrief AI
A bounded AI workflow that turns spreadsheet uploads into grounded business reports — without inventing unsupported metrics.
DataBrief AI analyzes CSV/XLSX files through semantic role detection, controlled Python execution, bounded repair, and report export. The system is designed to surface what the data supports and make unsupported metrics explicit.
Project Snapshot
What the system is
Type
AI reporting system
Use case
Spreadsheet analysis, business brief generation, decision support
Role
System architecture, product logic, frontend/backend implementation, AI workflow design
Stack
Next.js, FastAPI, Python analysis runtime, SQLite, CSV/XLSX parsing, controlled execution
Status
Prototype with live demo and GitHub available
Business Context
The workflow problem behind the project
Reporting workflows often start with messy CSV or XLSX files and end with summaries that look confident even when the source data cannot support them. This project explores how AI can help teams produce faster business briefs while keeping unsupported metrics, caveats, and data-quality limits visible.
System / Solution
How the workflow is bounded
The system accepts spreadsheet files, profiles the dataset, identifies available fields, plans only supported analysis, executes controlled Python, evaluates failures, and exports a report whose claims are tied to computed outputs. Human review stays central because the system shows caveats instead of hiding uncertainty.
Inputs
CSV or XLSX uploads with unknown structure, missing values, duplicates, and inconsistent column roles.
Workflow
Upload validation, semantic profiling, route selection, bounded analysis planning, execution, evaluation, repair, and export.
Processing logic
Generated Python runs under static checks and limits; unsupported metrics are removed or marked unavailable.
Output
Structured business report, findings, charts, generated code, and exportable artifacts for review.
Guardrails
Caveats, unsupported-claim checks, repair limits, and a run store make the output more traceable.
Overview
The product promise is restraint
Spreadsheets often contain useful business signals, but AI reports become misleading when they calculate metrics the dataset cannot support. DataBrief AI is built around a stricter question: what can this file prove, and what should the system refuse to infer?
Order Count / Average Order Value
- Reason
- No order ID was detected.
- Outcome
- The report switches to purchase-line analysis instead of inventing order metrics.
Return/cancel rate = 0%
- Reason
- No return, refund, cancel, or status field was detected.
- Outcome
- The report marks the metric unavailable instead of presenting a false zero.
The value of the workflow is not that it tries to answer everything. It is that it makes unsupported claims visible before they become business reporting.
Workflow
Four phases, clear boundaries
The software keeps a strict pipeline behind the simplified presentation: validate the upload, profile the dataset, route the domain, plan supported analysis, execute generated Python, evaluate the result, repair only recoverable failures, then package the report and exports.
Profile
Read rows, columns, types, missing values, duplicates, and semantic field roles.
Route + Plan
Classify the dataset domain and build a bounded KPI/chart plan from supported signals only.
Execute
Run controlled Python analysis to create metrics, charts, and structured artifacts.
Evaluate + Export
Classify execution results, repair only recoverable failures, ground the report, and package exports.
Output
A report surface that separates signal from caveat
The page keeps one strong report preview and three supporting output cards, instead of repeating the same mockup across every section.
Supported metrics
The report leads with metrics the uploaded file can actually support.
Grounded findings
Findings and data quality checks are shown before interpretation.
Artifacts + exports
Report Markdown, findings JSON, charts, and generated analysis code stay tied to the run.
Architecture
The software is a bounded pipeline, not a chatbot wrapper
The system is organized around explicit service boundaries: profile first, route from detectable signals, generate analysis code from templates, run it under guardrails, evaluate the result, and expose artifacts through stable export endpoints.
HTTP boundary
FastAPI accepts CSV/XLSX uploads, validates size and format, creates a run record, and exposes status, artifact, and export endpoints without leaking host paths.
Semantic profiling
The backend profiles rows, columns, missing values, duplicates, inferred types, and semantic roles before deciding what analysis path is allowed.
Controlled execution
Template-generated Python is checked with AST import rules and suspicious-pattern guards, then run in an isolated subprocess with time and resource limits.
Evaluation loop
Execution results are classified as success, recoverable, or unrecoverable. Recoverable failures get at most two deterministic repair attempts.
Grounded report
The report generator builds KPIs, findings, warnings, recommendations, and caveats from computed outputs only, then revises unsupported claims in one pass.
Run store
SQLite tracks run status, route, plan, evaluation, retry count, generated code, report payload, expiry, and export metadata for each upload.
Result
A useful AI system because it stays inside its lane
DataBrief AI uses agentic patterns — routing, evaluation, bounded repair, and grounded generation — inside a deterministic workflow shell.
Product decision
A bounded workflow fits spreadsheet analysis better than a free-form agent because the task has a repeatable structure.
Reliability frame
The system values explicit caveats, controlled execution, and supported outputs over broad autonomy.
Public boundary
The case study shows architecture and behavior without positioning the prototype as production SaaS.
Built with
Boundaries
- It is not positioned as production SaaS.
- It is not a fully autonomous AI agent.
- It does not implement OS-level sandbox isolation.
- It does not enrich uploads with external web sources.
- Output quality depends on detectable column roles and dataset structure.
- Dedicated campaign-performance routing is a future domain extension.
Why It Matters
Reliability beats novelty
Most AI reporting tools fail when they treat the model as an open-ended analyst. DataBrief AI takes the opposite approach: clear inputs, controlled execution, explicit constraints, and reviewable outputs. That makes the workflow more relevant for real teams where reporting quality and decision traceability matter more than broad autonomy.
Client Relevance
Where this becomes useful
A client-facing version could help creative, brand, research, or operations teams reduce manual spreadsheet reporting, standardize recurring summaries, and improve AI-assisted analysis without presenting unsupported conclusions as fact.
Discuss a Similar AI System
Have a creative system worth extending?
If your team has a creative process, internal tool, campaign workflow, or brand system worth extending with AI, send a short brief and I'll help define the clearest system logic.