Skip to content
Final StateCheap to Make, Expensive to Check
VOL. I  ·  NODE 005▢  ATLAS

THE HINGE

Cheap to Make, Expensive to Check

Generative AI drove the cost of making work toward zero and left the cost of checking it where it found it. Whether you stay the director or become a limb turns on a single question: does a cheap verifier exist.

THE GATE

The Generation-Verification Gap

MIT Sloan: cheap to produce, no cheaper to judgeSAFE · LEVELproduceverifyUNSAFE TO OFFLOADcost(verify) > cost(produce)
  • MIT Sloan: cheap to produce, no cheaper to judge
  • Naive checking re-runs the same expensive inference
  • Offload only when cost(check) is far below cost(make)

This is the generation-verification gap, the economic gate beneath every offload and the live edge of where convexity is bankable.

THE LAST THIRTY

The 70% Problem

First 70%: fast, cheap, boilerplatelooks finished heremachine: fasthuman: the hard part0%70%100%DIFFICULTY ↑PROGRESS THROUGH THE JOB →the cheap part was never the hard part

The speed was never the obstacle. The last thirty percent is the out-of-distribution work a model should decline, the cost-side reading of the 70% problem.

  • First 70%: fast, cheap, boilerplate
  • Last 30%: edge cases, security, judgment
  • An eager junior who needs constant supervision (Osmani 2024)

REVIEW IS THE WORK

Checking is not what comes after the work. It is the work.

Not a rubber stamp but the actual labour — and when checking nears making, oversight becomes re-work.

WHERE ORACLES LIVE

Verifiable Space

AlphaEvolve: a proof rules on each constructionSEARCH SPACECANDIDATESVERIFIERPROOF / PHYSICS / WET LABVERIFIEDDISCARDED
  • AlphaEvolve: a proof rules on each construction
  • GNoME: physics computes stability
  • AlphaProteo: a wet lab measures binding
  • The line is checkable versus not

AI replaces search, not judgment. It wins inside verifiable space, where a cheap oracle gives every proposal a yes or no against hard ground truth.

SAME TOOL, TWO FATES

One Model, Opposite Fates

Director side: support novices +34%, average +14% (Brynjolfsson, Li & Raymond 2023)ONE MODELCHEAP CHECK PRESENTNO CHEAP CHECKCHEAP CHECK?DIRECTORnovices +34%average +14%LIMBexperts −19%felt +20%ONE MODEL · OPPOSITE FATES
  • Director side: support novices +34%, average +14% (Brynjolfsson, Li & Raymond 2023)
  • Limb side: experts 19% slower, felt 20% faster (METR 2025)
  • What divides them: does a cheap verifier exist

The felt-versus-measured gap is its own trap, and where the check goes unpaid, the worker sinks into the centaur's hindquarters.

THE SHELF

The Director-or-Limb Shelf

Each task carries two bars: one to make, one to checkCHEAP ORACLEFLIP LINE · VERIFY = PRODUCECENTAURREVERSE-CENTAURVERIFY COST ↑PRODUCEVERIFY

The flip is the reverse centaur: same human in the loop, opposite locus of judgment. When checking costs too much, the human becomes the centaur's hindquarters.

  • Each task carries two bars: one to make, one to check
  • Below the flip line, a cheap-oracle stamp and an upright centaur
  • Above it, checking beats making and the rider becomes a reverse-centaur

BUILD THE CHECK

If No Oracle Exists, Build One

  1. 01Seal the threshold before you look: positives fell 96% to 44% (Scheel et al. 2021)
  2. 02Run an adversarial second pass through a fresh model with no shared memory
  3. 03Bring an outside reviewer through on a fixed clock to re-check one piece cold

Independence is the load-bearing variable, not headcount. Rebuild it with the three lines of defense: a sealed threshold, an adversarial second pass, a periodic outside reviewer.

Seal the threshold before you look: positives fell 96% to 44% (Scheel et al. 2021)THREE LINES OF DEFENSEoperateoverseeverifysolo — all three at onceTIMEsealed thresholdSTANCEadversarial passPEOPLEexternal review

DIRECTOR OR LIMB

The Question That Decides Your Role

  • Not 'can AI do it' but 'can you cheaply check it'
  • Cheap oracle: you stay the director
  • No oracle: build the check first, or keep the seat

Only the checkable upside is real: cheap to make, expensive to check, and only the checkable upside is bankable.

Read the transcript

01 · THE HINGE

A model writes the quarterly report in nine seconds. Reading it closely enough to trust it takes the analyst an hour. The machine made the words almost free. It did nothing to the price of knowing whether the words are true. That split, cheap to make and dear to check, is the quiet hinge the whole era turns on.

02 · THE GATE

MIT Sloan stated it without flinching: AI makes work cheap to produce and no cheaper to judge. The naive way to check a model is to run the same costly inference again, and output tokens often price several times higher than the ones going in. So the gate is an inequality, not a feeling. Hand a task over only when checking it costs far less than doing it yourself.

03 · THE LAST THIRTY

Addy Osmani named the shape in 2024. Across the bar, the first seventy percent fills in a fraction of the usual time: scaffolding, the obvious patterns, the boilerplate. Then the line turns. The last thirty climbs steeply, the edge cases, the security, the judgment, every bit as hard as it ever was. He calls the model an eager junior developer who needs constant supervision. The cheap part was never the hard part.

04 · REVIEW IS THE WORK

Reviewing the machine's output gets booked as a free action, a glance laid on top of the real labour. It is the costly step itself. When checking a thing costs nearly as much as making it, the human in the loop stops being oversight and becomes unpaid re-work at machine speed. We keep entering the expensive part in the ledger as nothing.

05 · WHERE ORACLES LIVE

The wins that genuinely convince share one shape: a model paired with a cheap external check over a searchable space. AlphaEvolve writes mathematical constructions, and a proof rules on each one. GNoME proposes crystals, and physics computes whether they hold. AlphaProteo designs protein binders, and a wet lab measures whether they bind. The dividing line is not clever AI against clumsy AI. It runs between the checkable and the rest.

06 · SAME TOOL, TWO FATES

Set two workers beside the same model. Customer-support agents gained fourteen percent on average, thirty-four for the novices, close to nothing for the experts. A 2025 randomised trial put experienced developers nineteen percent slower with the tool, while they swore they were twenty percent faster. One model, opposite fates. What separates them is whether a cheap check stands between the worker and being wrong.

07 · THE SHELF

Stand every task on a shelf with two bars, one for making it and one for checking it, and sort the shelf by the checking bar. Down at the bottom the checking bar is short. A cheap oracle stamps the work, the tests pass, the numbers tie, and you ride as the centaur, a reasoning head on a tireless body. Climb the shelf and the checking bar overtakes the making bar. The stamp falls away. The rider inverts. Now the machine holds the head and you are the limb.

08 · BUILD THE CHECK

When no cheap verifier exists, the task is not human forever, but build the check before you collapse the seat. The Three Lines of Defense model names the trap: a solo operator runs the first line, the second, and the third at once, so independent verification cannot exist by design. Rebuild it cheaply, across time, stance, and people. Sealing the success criteria before you looked cut over-claimed positive results from ninety-six percent to forty-four.

09 · DIRECTOR OR LIMB

The question that fixes your role is smaller than it sounds. Not whether AI can do the work; it can do the seventy percent. Whether a cheap check stands between you and being wrong. Where the oracle is cheap, you stay the director and the machine is the finest body you have ever worked with. Where it is not, you are a limb. Build the verifier, or keep the seat.

01 / 9 · THE HINGE0:00 / 4:41