AI Systems / Audit Workflow
Website Audit Agent
An evidence-backed website audit workflow that turns public websites into structured UX, SEO, performance, content, and prospect-intelligence reports.
The goal was to turn website review from subjective opinion into a repeatable evidence workflow, while keeping AI useful but bounded.
Primary CTA is visible but not consistently reinforced above the fold.
Evidence label: ObservedProject Snapshot
What the system is
Type
Evidence-based AI audit workflow
Use case
Website diagnostics, UX/SEO/conversion review, internal prospect intelligence
Role
System architecture, workflow logic, product framing, frontend/backend implementation
Stack
Next.js, TypeScript, Postgres, pg-boss, Playwright, Gemini, Vercel
Status
Private internal prototype with public GitHub available
Business Context
The workflow problem behind the project
Website reviews are high-friction because teams need evidence, prioritization, and commercial interpretation, not generic AI feedback. This project explores how AI can support faster diagnostics while keeping measurement, observation, and inference separate.
System / Solution
How the workflow is bounded
The system takes a public URL, captures rendered or static evidence, persists the run, applies deterministic scoring, and only then passes accepted findings into an AI synthesis layer. The report keeps evidence labels visible so recommendations can be reviewed instead of blindly trusted.
Inputs
Public website URLs and captured page evidence from browser-first or static collection.
Workflow
URL intake, capture, evidence storage, deterministic scoring, bounded synthesis, and internal report assembly.
Processing logic
Rules create audit findings and scores before the model summarizes accepted evidence.
Output
Private report with scores, confidence, evidence labels, top findings, and prioritized recommendations.
Guardrails
The LLM cannot create findings, modify scores, invent metrics, or present inferred claims as measured truth.
Problem
The problem with AI audits is trust.
AI website-audit tools often collapse observation, scoring, and interpretation into one opaque model response. That makes the output fast, but difficult to trust.
Website Audit Agent was designed around the opposite principle: before the model says anything, the system must capture evidence, classify confidence, and define what is actually known.
Overview
The audit truth is deterministic. The AI is interpretive.
This is not a free-form audit chatbot. The model does not create findings, score categories, invent metrics, or turn weak signals into measured facts. The LLM layer only synthesizes accepted evidence, which makes the workflow more credible for internal prospecting.
Capture evidence first
The workflow starts with public website evidence, not model speculation. Rendered capture is attempted first, with static public evidence as fallback.
Score with rules
Findings and category scores are produced deterministically from captured evidence before any synthesis layer is involved.
Interpret after acceptance
Gemini and the Prospect Audit Agent receive accepted findings only, then translate them into useful internal acquisition intelligence.
Workflow
Controlled pipeline from public URL to private report
The architecture separates capture, persistence, scoring, synthesis, and report assembly so each stage has a clear owner and failure mode.
URL Intake
A user submits a public website URL.
Browser / Static Capture
Rendered evidence is captured first, then static public evidence is used if rendering fails or is blocked.
Evidence Store
Snapshots and page evidence are persisted for the run.
Deterministic Scoring
Rules generate findings and category scores.
Bounded AI Synthesis
Gemini and the Prospect Audit Agent summarize only accepted findings.
Internal Report
The output becomes private acquisition intelligence.
Evidence model
Every claim needs an evidence status
The system can interpret, but it must not pretend inference is measurement. Evidence labels make that boundary visible inside the report.
Measured
Directly captured or computed.
Observed
Visible in captured page evidence.
Inferred
Interpretation from available signals; never presented as measured truth.
Agentic layer
The AI is downstream by design.
This is a hybrid workflow-agent system. The deterministic shell owns capture, scoring, persistence, status, and report assembly. The LLM layer owns summary, prioritization, explanation, and prospect intelligence. The Prospect Audit Agent does not browse freely, rewrite findings, or create scores.
What the LLM can and cannot do
Allowed
- Summarize accepted findings.
- Prioritize recommendations.
- Translate audit evidence into internal prospect intelligence.
- Explain why a finding matters commercially.
Blocked
- Create audit findings.
- Modify category scores.
- Invent metrics.
- Present inferred claims as measured truth.
- Make unsupported revenue claims.
Report surface
A private audit report built from accepted evidence
The report surface is intentionally internal. It presents scores, confidence, evidence labels, and top findings without exposing the private Vercel deployment as a public demo.
Primary CTA is visible but weakly reinforced above the fold.
Safety / Access boundary
Public repository. Private operating surface.
The project is public for portfolio and reference purposes, while the operating surface stays private behind an internal login and protected job endpoint.
Result
A useful AI system because it stays inside its lane.
The result is a private acquisition workflow that can turn a public website URL into a structured internal report with evidence labels, category scores, and prioritized findings.
Repeatable audit pipeline
A public URL moves through capture, evidence storage, scoring, synthesis, and reporting with explicit stage boundaries.
Controlled AI synthesis
The LLM summarizes accepted findings and prospect context without creating audit truth.
Evidence discipline
Measured, observed, and inferred claims stay distinct throughout the report.
Portfolio-safe architecture
The repo can be shown publicly while the operating surface and generated reports remain private.
Stack
Built as a production-oriented prototype
The stack is practical: Next.js for the app surface, Postgres and pg-boss for durable runs, Playwright for capture, Gemini for bounded synthesis, and a worker endpoint protected separately from the internal session gate.
Boundaries / Future Improvements
These are design boundaries, not excuses.
The current scope is deliberately narrow so the case study does not overclaim what the system does.
- Not a public SaaS.
- No public live demo.
- Prospect Intelligence is internal guidance, not audit truth.
- Static-only reports intentionally exclude visual/mobile/above-the-fold scoring.
- AI synthesis depends on accepted findings.
- Future work includes evals, model comparison, observability, and real audit examples.
Why It Matters
Reliability beats novelty
AI audits become risky when they collapse evidence and interpretation into one model response. This system defines clear inputs, accepted evidence, deterministic scoring, and reviewable AI synthesis, which makes the recommendations more useful for teams that need repeatable diagnostics.
Client Relevance
Where this becomes useful
A client-facing version could help brand, content, product, or digital teams standardize site audits, speed up UX/SEO/conversion diagnostics, and turn messy website observations into clearer decision support.
Discuss a Similar AI System
Have a creative system worth extending?
If your team has a creative process, internal tool, campaign workflow, or brand system worth extending with AI, send a short brief and I'll help define the clearest system logic.