THE HINGE
Cheap to Make, Expensive to Check
Generative AI drove the cost of making work toward zero and left the cost of checking it where it found it. Whether you stay the director or become a limb turns on a single question: does a cheap verifier exist.
THE GATE
The Generation-Verification Gap
- MIT Sloan: cheap to produce, no cheaper to judge
- Naive checking re-runs the same expensive inference
- Offload only when cost(check) is far below cost(make)
This is the generation-verification gap, the economic gate beneath every offload and the live edge of where convexity is bankable.
THE LAST THIRTY
The 70% Problem
The speed was never the obstacle. The last thirty percent is the out-of-distribution work a model should decline, the cost-side reading of the 70% problem.
- First 70%: fast, cheap, boilerplate
- Last 30%: edge cases, security, judgment
- An eager junior who needs constant supervision (Osmani 2024)
REVIEW IS THE WORK
Checking is not what comes after the work. It is the work.
Not a rubber stamp but the actual labour — and when checking nears making, oversight becomes re-work.
WHERE ORACLES LIVE
Verifiable Space
- AlphaEvolve: a proof rules on each construction
- GNoME: physics computes stability
- AlphaProteo: a wet lab measures binding
- The line is checkable versus not
AI replaces search, not judgment. It wins inside verifiable space, where a cheap oracle gives every proposal a yes or no against hard ground truth.
SAME TOOL, TWO FATES
One Model, Opposite Fates
- Director side: support novices +34%, average +14% (Brynjolfsson, Li & Raymond 2023)
- Limb side: experts 19% slower, felt 20% faster (METR 2025)
- What divides them: does a cheap verifier exist
The felt-versus-measured gap is its own trap, and where the check goes unpaid, the worker sinks into the centaur's hindquarters.
THE SHELF
The Director-or-Limb Shelf
The flip is the reverse centaur: same human in the loop, opposite locus of judgment. When checking costs too much, the human becomes the centaur's hindquarters.
- Each task carries two bars: one to make, one to check
- Below the flip line, a cheap-oracle stamp and an upright centaur
- Above it, checking beats making and the rider becomes a reverse-centaur
BUILD THE CHECK
If No Oracle Exists, Build One
- 01Seal the threshold before you look: positives fell 96% to 44% (Scheel et al. 2021)
- 02Run an adversarial second pass through a fresh model with no shared memory
- 03Bring an outside reviewer through on a fixed clock to re-check one piece cold
Independence is the load-bearing variable, not headcount. Rebuild it with the three lines of defense: a sealed threshold, an adversarial second pass, a periodic outside reviewer.