AI Systems / Analytics Workflow

DataBrief AI

A bounded AI workflow that turns spreadsheet uploads into grounded business reports — without inventing unsupported metrics.

DataBrief AI analyzes CSV/XLSX files through semantic role detection, controlled Python execution, bounded repair, and report export. The system is designed to surface what the data supports and make unsupported metrics explicit.

View Prototype →View GitHub →View System Logic

Bounded AICSV/XLSXPythonReport ExportEvaluation Loop

Analysis ReportData confidence: High

Execution completeUnsupported metrics flagged

Primary metric$10,482Total spend detected

Purchase line count1,248Rows with purchase signal

Average spend per row$8.40Supported by row-level data

Order Count and Average Order Value are unavailable because no order ID was detected.

Representative report preview showing supported metrics and explicit caveats.

Case Snapshot

The strategic brief

The problem, the system response, the available proof, the strategic value, and the intentional boundary.

01Problem: Spreadsheet reports can sound confident even when the uploaded data cannot support the claimed metrics.
02System: A bounded reporting pipeline for dataset profiling, controlled Python execution, evaluation, caveats, and grounded export.
03Proof: Live prototype, public GitHub repository, representative report surfaces, generated code, charts, and exportable artifacts.
04Value: Makes AI-assisted analysis faster without allowing unsupported conclusions to pass as business facts.
05Limitation: It is a prototype, not production SaaS or an OS-level execution sandbox; output depends on detectable dataset structure.

View Prototype View GitHub

Business Context

The workflow problem behind the project

Reporting workflows often start with messy CSV or XLSX files and end with summaries that look confident even when the source data cannot support them. This project explores how AI can help teams produce faster business briefs while keeping unsupported metrics, caveats, and data-quality limits visible.

System / Solution

How the workflow is bounded

The system accepts spreadsheet files, profiles the dataset, identifies available fields, plans only supported analysis, executes controlled Python, evaluates failures, and exports a report whose claims are tied to computed outputs. Human review stays central because the system shows caveats instead of hiding uncertainty.

Inputs

CSV or XLSX uploads with unknown structure, missing values, duplicates, and inconsistent column roles.

Workflow

Upload validation, semantic profiling, route selection, bounded analysis planning, execution, evaluation, repair, and export.

Processing logic

Generated Python runs under static checks and limits; unsupported metrics are removed or marked unavailable.

Output

Structured business report, findings, charts, generated code, and exportable artifacts for review.

Guardrails

Caveats, unsupported-claim checks, repair limits, and a run store make the output more traceable.

Overview

The product promise is restraint

Spreadsheets often contain useful business signals, but AI reports become misleading when they calculate metrics the dataset cannot support. DataBrief AI is built around a stricter question: what can this file prove, and what should the system refuse to infer?

Supported claim

Purchase line count and average spend per row

Evidence source: The file contains purchase rows and numeric spend values.

Unsupported claim avoided

Order Count / Average Order Value

Reason: No order ID was detected.
Outcome: The report switches to purchase-line analysis instead of inventing order metrics.

Supported claim

Missing-value and duplicate-row checks

Evidence source: These conditions can be measured directly from the uploaded table.

Unsupported claim avoided

Return/cancel rate = 0%

Reason: No return, refund, cancel, or status field was detected.
Outcome: The report marks the metric unavailable instead of presenting a false zero.

Design principleReliability over autonomy.

The value of the workflow is not that it tries to answer everything. It is that it makes unsupported claims visible before they become business reporting.

Workflow

Four phases, clear boundaries

The software keeps a strict pipeline behind the simplified presentation: validate the upload, profile the dataset, route the domain, plan supported analysis, execute generated Python, evaluate the result, repair only recoverable failures, then package the report and exports.

ProfileRoute + PlanExecuteEvaluate + Export

Profile

Read rows, columns, types, missing values, duplicates, and semantic field roles.

Route + Plan

Classify the dataset domain and build a bounded KPI/chart plan from supported signals only.

Execute

Run controlled Python analysis to create metrics, charts, and structured artifacts.

Evaluate + Export

Classify execution results, repair only recoverable failures, ground the report, and package exports.

Output

A report surface that separates signal from caveat

The page keeps one strong report preview and three supporting output cards, instead of repeating the same mockup across every section.

Analysis ReportData confidence: High

Execution completeUnsupported metrics flagged

Primary metric$10,482Total spend detected

Purchase line count1,248Rows with purchase signal

Average spend per row$8.40Supported by row-level data

Order Count and Average Order Value are unavailable because no order ID was detected.

Representative run preview. Unsupported order-level metrics are explicitly flagged.

Supported metrics

The report leads with metrics the uploaded file can actually support.

Grounded findings

Findings and data quality checks are shown before interpretation.

Artifacts + exports

Report Markdown, findings JSON, charts, and generated analysis code stay tied to the run.

Architecture

The software is a bounded pipeline, not a chatbot wrapper

The system is organized around explicit service boundaries: profile first, route from detectable signals, generate analysis code from templates, run it under guardrails, evaluate the result, and expose artifacts through stable export endpoints.

HTTP boundary

FastAPI accepts CSV/XLSX uploads, validates size and format, creates a run record, and exposes status, artifact, and export endpoints without leaking host paths.

Semantic profiling

The backend profiles rows, columns, missing values, duplicates, inferred types, and semantic roles before deciding what analysis path is allowed.

Controlled execution

Template-generated Python is checked with AST import rules and suspicious-pattern guards, then run in an isolated subprocess with time and resource limits.

Evaluation loop

Execution results are classified as success, recoverable, or unrecoverable. Recoverable failures get at most two deterministic repair attempts.

Grounded report

The report generator builds KPIs, findings, warnings, recommendations, and caveats from computed outputs only, then revises unsupported claims in one pass.

Run store

SQLite tracks run status, route, plan, evaluation, retry count, generated code, report payload, expiry, and export metadata for each upload.

Result

A useful AI system because it stays inside its lane

DataBrief AI uses agentic patterns — routing, evaluation, bounded repair, and grounded generation — inside a deterministic workflow shell.

Product decision

A bounded workflow fits spreadsheet analysis better than a free-form agent because the task has a repeatable structure.

Reliability frame

The system values explicit caveats, controlled execution, and supported outputs over broad autonomy.

Public boundary

The case study shows architecture and behavior without positioning the prototype as production SaaS.

Built with

Next.js frontendFastAPI backendPython analysis runtimeCSV/XLSX parsingStatic code checksControlled executionReport exportSemantic quality tests

Boundaries

It is not positioned as production SaaS.
It is not a fully autonomous AI agent.
It does not implement OS-level sandbox isolation.
It does not enrich uploads with external web sources.
Output quality depends on detectable column roles and dataset structure.
Dedicated campaign-performance routing is a future domain extension.

View Prototype →View GitHub →View Architecture

Why It Matters

Reliability beats novelty

Most AI reporting tools fail when they treat the model as an open-ended analyst. DataBrief AI takes the opposite approach: clear inputs, controlled execution, explicit constraints, and reviewable outputs. That makes the workflow more relevant for real teams where reporting quality and decision traceability matter more than broad autonomy.

Client Relevance

Where this becomes useful

A client-facing version could help creative, brand, research, or operations teams reduce manual spreadsheet reporting, standardize recurring summaries, and improve AI-assisted analysis without presenting unsupported conclusions as fact.

Discuss a Similar AI System

Have a creative system worth extending?

If your team has a creative process, internal tool, campaign workflow, or brand system worth extending with AI, send a short brief and I'll help define the clearest system logic.

Start Creative Systems Brief

AI Systems / Audit WorkflowWebsite Audit AgentView case study →

Raw-data benchmark intelligence engineBenchmark Intelligence EngineView case study →