03·/build

How I ship software with agents

Working software shipped by an agent harness that refuses to lie about done. Every feature passes type-checks, tests, and a production build before it can call itself finished.

I'm not a software engineer. I'm an operator who builds working systems with AI. The honesty here is enforced by machinery, not trust.

enforcement floor · armedspec → ship · 7 gatesbuilding

✓spec✓planbuild8/149/14reviewQAsecurityship

wave w-23elapsed 11:40

enforcement floor

0.99¹⁰⁰ steps37%

A 99%-reliable-per-step agent finishes a 100-step feature correctly about a third of the time. Gates fix that, not a smarter prompt.

task board · 14 in wave✓89◐ 3⊘ 1◌ 1

T-04schema + RLS migrationT-04w1code-done

T-07save / unsave server actionsT-07w2code-done

T-09library list query · DALT-09w3coding

T-12saved-state hydrationT-12w2codingcode-done

T-15OG image · shared library routeneeds: asset spec from designw4codingblocked

T-18empty-state + a11y passT-18—w1to-docoding

T-21library paginationT-21—to-do

gate consolestreaming

harness ▸ T-12 · saved-state hydration · enforce done-gate

tsc--noEmitchecking…0 errors

vitestrunrunning…103 / 103 passed

nextbuildcompiling…compiled · 41s

gitcommitwriting…b7e2c14 · 6 files

done-gate ▸ passed: task is done

gate history0 failed· forced-green forbidden

harness ▸

this build99%per-step reliability~37%unenforced 100-step success7pipeline gatestsc · tests · builddone-gatecaptured run · this harness

How it works

Reliability comes from enforcement, not better prompts

An agent that's reliable per step still fails a long feature without enforcement. You don't fix that with a smarter prompt. You fix it with gates the agent cannot talk past.

01
A gated pipeline
Every feature runs spec → plan → build ⇄ review → QA → security → ship. Each phase produces an artifact on disk, gated before the next begins.
02
Hooks, not the honor system
Destructive commands blocked, protected files locked, database security required on every change, and a done-gate that runs type-check, tests, and a production build. A rule in a prompt drifts; a hook holds.
03
Honest status, fresh-eyes review
Workers report done, done-with-concerns, blocked, or needs-context. Blocked is a valid answer, forced-green is forbidden. A reviewer that didn't write the code reads every diff first.

Proof

The harness, and what it shipped

The build engine as a repository, alongside the products it produced — this site among them. Repos to read, not screenshots.

Repository evidence is being packaged for publication.

What runs it

Claude Codethe agent harness

Spec · plan · review · QA · securityphase artifacts on disk

Git pre-commit hooksthe enforcement floor

Type-check · tests · buildthe done-gate

Next.js · Supabase · Vercelthe shipping stack

What your team gets

Features that are actually done when they say they're done. A delivery system where the non-negotiables are enforced, not hoped for, and an operator who built it and runs it daily.

Read the CV How the system works

Explore the rest of the system

Research

Deep research on demand

Intelligence

Always-on market intelligence

Outbound

AI-native outbound

How I work

The operating system behind it

How I ship software with agents

Reliability comes from enforcement, not better prompts

A gated pipeline

Hooks, not the honor system

Honest status, fresh-eyes review

The harness, and what it shipped

What your team gets