How I ship software with agents
Working software shipped by an agent harness that refuses to lie about done. Every feature passes type-checks, tests, and a production build before it can call itself finished.
I'm not a software engineer. I'm an operator who builds working systems with AI. The honesty here is enforced by machinery, not trust.
0.99100 steps37%
A 99%-reliable-per-step agent finishes a 100-step feature correctly about a third of the time. Gates fix that, not a smarter prompt.
harness ▸ T-12 · saved-state hydration · enforce done-gate
done-gate ▸ passed: task is done
harness ▸
An agent's per-step reliability compounds to roughly a third over a long feature; a gated pipeline with a type-check, tests, and production-build done-gate is the fix.
Reliability comes from enforcement, not better prompts
An agent that's reliable per step still fails a long feature without enforcement. You don't fix that with a smarter prompt. You fix it with gates the agent cannot talk past.
- 01
A gated pipeline
Every feature runs spec → plan → build ⇄ review → QA → security → ship. Each phase produces an artifact on disk, gated before the next begins.
- 02
Hooks, not the honor system
Destructive commands blocked, protected files locked, database security required on every change, and a done-gate that runs type-check, tests, and a production build. A rule in a prompt drifts; a hook holds.
- 03
Honest status, fresh-eyes review
Workers report done, done-with-concerns, blocked, or needs-context. Blocked is a valid answer, forced-green is forbidden. A reviewer that didn't write the code reads every diff first.
The harness, and what it shipped
The build engine as a repository, alongside the products it produced — this site among them. Repos to read, not screenshots.
What your team gets
Features that are actually done when they say they're done. A delivery system where the non-negotiables are enforced, not hoped for, and an operator who built it and runs it daily.