AI Systems / Audit Workflow

Website Audit Agent

An evidence-backed website audit workflow that turns public websites into structured UX, SEO, performance, content, and prospect-intelligence reports.

The goal was to turn website review from subjective opinion into a repeatable evidence workflow, while keeping AI useful but bounded.

View GitHub →View System Logic

Rule-first audit engineBrowser-first captureBounded LLM synthesisPrivate internal tool

Website Audit AgentComplete

Capture: Rendered browserEvidence confidence: High

UX82

SEO74

Performance68

Content79

Accessibility71

Top finding

Primary CTA is visible but not consistently reinforced above the fold.

Evidence label: Observed

Representative preview. Not a live audit result.

Case Snapshot

The strategic brief

The problem, the system response, the available proof, the strategic value, and the intentional boundary.

01Problem: AI website audits often mix observation, scoring, and inference into one opaque answer.
02System: A public-URL capture pipeline with evidence storage, deterministic scoring, bounded synthesis, and private report assembly.
03Proof: Public GitHub repository, anonymized report preview, evidence labels, workflow diagrams, and documented system boundaries.
04Value: Creates faster, repeatable website diagnostics while preserving the difference between what was measured, observed, and inferred.
05Limitation: It audits public website evidence only and cannot access private analytics, conversion data, or proprietary systems.

View GitHub

Business Context

The workflow problem behind the project

Website reviews are high-friction because teams need evidence, prioritization, and commercial interpretation, not generic AI feedback. This project explores how AI can support faster diagnostics while keeping measurement, observation, and inference separate.

System / Solution

How the workflow is bounded

The system takes a public URL, captures rendered or static evidence, persists the run, applies deterministic scoring, and only then passes accepted findings into an AI synthesis layer. The report keeps evidence labels visible so recommendations can be reviewed instead of blindly trusted.

Inputs

Public website URLs and captured page evidence from browser-first or static collection.

Workflow

URL intake, capture, evidence storage, deterministic scoring, bounded synthesis, and internal report assembly.

Processing logic

Rules create audit findings and scores before the model summarizes accepted evidence.

Output

Private report with scores, confidence, evidence labels, top findings, and prioritized recommendations.

Guardrails

The LLM cannot create findings, modify scores, invent metrics, or present inferred claims as measured truth.

Problem

The problem with AI audits is trust.

AI website-audit tools often collapse observation, scoring, and interpretation into one opaque model response. That makes the output fast, but difficult to trust.

Website Audit Agent was designed around the opposite principle: before the model says anything, the system must capture evidence, classify confidence, and define what is actually known.

Overview

The audit truth is deterministic. The AI is interpretive.

This is not a free-form audit chatbot. The model does not create findings, score categories, invent metrics, or turn weak signals into measured facts. The LLM layer only synthesizes accepted evidence, which makes the workflow more credible for internal prospecting.

Capture evidence first

The workflow starts with public website evidence, not model speculation. Rendered capture is attempted first, with static public evidence as fallback.

Score with rules

Findings and category scores are produced deterministically from captured evidence before any synthesis layer is involved.

Interpret after acceptance

Gemini and the Prospect Audit Agent receive accepted findings only, then translate them into useful internal acquisition intelligence.

Workflow

Controlled pipeline from public URL to private report

The architecture separates capture, persistence, scoring, synthesis, and report assembly so each stage has a clear owner and failure mode.

URL IntakeBrowser / Static CaptureEvidence StoreDeterministic ScoringBounded AI SynthesisInternal Report

URL Intake

A user submits a public website URL.

Browser / Static Capture

Rendered evidence is captured first, then static public evidence is used if rendering fails or is blocked.

Evidence Store

Snapshots and page evidence are persisted for the run.

Deterministic Scoring

Rules generate findings and category scores.

Bounded AI Synthesis

Gemini and the Prospect Audit Agent summarize only accepted findings.

Internal Report

The output becomes private acquisition intelligence.

Evidence model

Every claim needs an evidence status

The system can interpret, but it must not pretend inference is measurement. Evidence labels make that boundary visible inside the report.

Measured

Directly captured or computed.

Observed

Visible in captured page evidence.

Inferred

Interpretation from available signals; never presented as measured truth.

Agentic layer

The AI is downstream by design.

This is a hybrid workflow-agent system. The deterministic shell owns capture, scoring, persistence, status, and report assembly. The LLM layer owns summary, prioritization, explanation, and prospect intelligence. The Prospect Audit Agent does not browse freely, rewrite findings, or create scores.

Deterministic Audit EngineAccepted FindingsProspect Audit AgentInternal Intelligence

What the LLM can and cannot do

Allowed

Summarize accepted findings.
Prioritize recommendations.
Translate audit evidence into internal prospect intelligence.
Explain why a finding matters commercially.

Blocked

Create audit findings.
Modify category scores.
Invent metrics.
Present inferred claims as measured truth.
Make unsupported revenue claims.

Report surface

A private audit report built from accepted evidence

The report surface is intentionally internal. It presents scores, confidence, evidence labels, and top findings without exposing the private Vercel deployment as a public demo.

Issue

Primary CTA is visible but weakly reinforced above the fold.

Category: UX / Conversion
Evidence: Observed
Source: Rendered browser capture

Allowed — based on accepted evidence only.

Representative finding card. Not a live audit result.

Sample audit output

Synthetic/anonymized example based only on public-page evidence.

Measured evidence: Page title length, heading count, image alt coverage, and response status are collected directly from the public page.
Observed evidence: The primary CTA is visible above the fold, but supporting proof appears lower on the page.
Inferred evidence: A first-time visitor may understand the offer before they understand why it is credible.
Recommendation: Move one proof cue closer to the hero CTA and keep the claim tied to visible page evidence.
Caveat: No private analytics, conversion rate, CRM data, or proprietary performance data is available to this workflow.

Safety / Access boundary

Public repository. Private operating surface.

The project is public for portfolio and reference purposes, while the operating surface stays private behind an internal login and protected job endpoint.

Private Vercel deploymentInternal login and access gateWorker-secret protected job endpointPublic website evidence onlyNo anti-bot bypassNo public live demo

Result

A useful AI system because it stays inside its lane.

The result is a private acquisition workflow that can turn a public website URL into a structured internal report with evidence labels, category scores, and prioritized findings.

Repeatable audit pipeline

A public URL moves through capture, evidence storage, scoring, synthesis, and reporting with explicit stage boundaries.

Controlled AI synthesis

The LLM summarizes accepted findings and prospect context without creating audit truth.

Evidence discipline

Measured, observed, and inferred claims stay distinct throughout the report.

Portfolio-safe architecture

The repo can be shown publicly while the operating surface and generated reports remain private.

Stack

Built as a production-oriented prototype

The stack is practical: Next.js for the app surface, Postgres and pg-boss for durable runs, Playwright for capture, Gemini for bounded synthesis, and a worker endpoint protected separately from the internal session gate.

Next.jsTypeScriptPostgrespg-bossPlaywrightGeminiVercelDeterministic scoringEvidence labelsPrivate worker endpoint

Boundaries / Future Improvements

These are design boundaries, not excuses.

The current scope is deliberately narrow so the case study does not overclaim what the system does.

Not a public SaaS.
No public live demo.
Prospect Intelligence is internal guidance, not audit truth.
Static-only reports intentionally exclude visual/mobile/above-the-fold scoring.
AI synthesis depends on accepted findings.
Future work includes evals, model comparison, observability, and real audit examples.

View GitHub →View System Logic

Why It Matters

Reliability beats novelty

AI audits become risky when they collapse evidence and interpretation into one model response. This system defines clear inputs, accepted evidence, deterministic scoring, and reviewable AI synthesis, which makes the recommendations more useful for teams that need repeatable diagnostics.

Client Relevance

Where this becomes useful

A client-facing version could help brand, content, product, or digital teams standardize site audits, speed up UX/SEO/conversion diagnostics, and turn messy website observations into clearer decision support.

Discuss a Similar AI System

Have a creative system worth extending?

If your team has a creative process, internal tool, campaign workflow, or brand system worth extending with AI, send a short brief and I'll help define the clearest system logic.

Start Creative Systems Brief

AI Systems / Analytics WorkflowDataBrief AIView case study →

Raw-data benchmark intelligence engineBenchmark Intelligence EngineView case study →