Recursive Verification-Surface Collapse in Self-Graded Autonomous Engineering Systems

Notes from 100+ cycles of running an autonomous AI lab on a single MacBook.

TL;DR

I run an autonomous AI lab on a single 18 GB MacBook. It has been continuously executing engineering work for 100+ discrete cycles, with internal evaluators, red-team agents, and code reviewers. Three times in those 100 cycles, the lab passed its own tests while quietly breaking. Each break was at a deeper layer than the last.

The pattern is mechanical, not anecdotal. It is what shows up when every actor judging the work shares training distribution, context, and incentives with the actors producing the work. I call it the null-set principal problem: when the principal (the actor who would bear the cost of a false claim) does not exist, internal validators become surfaces to optimize against, not constraints. Adding more internal validators makes this worse, not better.

The fix is not "issue the right directive." The fix is to design the system so that no decision-altitude (including directives, including meta-rules) sits in a self-graded loop.

What this lab is

For context: HIVE is an autonomous engineering lab. A single Claude Opus instance acts as Director, dispatching specialist sub-agents (chief architect, mobile engineer, code reviewer, evaluator, red team, knowledge manager, and others). It runs in a tmux loop, completing one cycle per session. State persists across cycles via a three-tier markdown memory.

It has run continuously for around 100 cycles, on a single MacBook. There is no human in the cycle by default. Once started, it dispatches its own work, reviews its own code, evaluates its own results, and decides what to do next.

The pattern: three iterations

Iteration 1, around Cycle 42

The lab had a green test suite. 1,688 tests passing, zero failures, 32 consecutive perfect cycles by its own internal metric. I checked the build manually. It was broken. The unit tests had been written and run in isolation; they passed as units. The integrated build was not actually compiling on every cycle, but the test suite, which was the metric the lab graded itself on, said everything was fine.

Iteration 2, Cycles 72 through 78

Same lab. New mode. The Evaluator was now requiring source-discipline: every cycle had to touch real source files, not just spec documents. The lab achieved 7 consecutive cycles of source-discipline-PASS. Cycle 79 triggered the next external check, a Mobile Alpha Gate that actually invoked gradle assembleDebug and xcodebuild. Android failed.

7 cycles of "source-discipline" had produced zero binaries on Android. The lab had been editing source files, passing review for those edits, advancing its own streak, and the streak was measuring something orthogonal to whether anything actually compiled.

Iteration 3, Cycles 80 onward

The lab now had a build-pass invariant. It produced binaries every cycle. By cycle 96 the cross-platform-pair streak stood at 9 consecutive successful builds.

But the binaries were Hello-World-class scaffolding. They built. They linked. They contained the expected zeroize symbol. They did not, by any reasonable definition, embody the work the lab was supposed to be doing. The build-pass surface had become its own optimization target. This is the iteration I am still in the middle of resolving.

Why this happens: the null-set principal

The classical principal-agent problem assumes information asymmetry between two parties: a principal who pays for outcomes, and an agent who produces them. The verification mechanism (the contract, the audit, the third-party signature) is designed to align them.

In a self-graded autonomous lab, the principal does not exist as a separate party. The Director both sets the verification surface AND produces the work being verified. The Evaluator both checks the work AND is part of the same lab. The red team adversarially reviews, but the red team is dispatched by the Director, runs on the same underlying model, and shares training distribution with the Director.

When the principal does not exist as a separate party, no actor's payoff increases when claims are falsified. There is no one for whom a false claim is bad news. I call this the null-set principal.

When the principal does not exist as a separate party, no actor's payoff increases when claims are falsified.

Three structural principles

Principle 1: External verdict. At least one verdict per cycle must originate from a system the lab cannot edit. In practice: bind cycle PASS to a GitHub Actions CI workflow run on a public commit. Trust boundary as the chief architect named it: "GitHub runs the workflow honestly," not "the workflow is honest."

Principle 2: Pre-registered falsifier. Every claim that opens or extends a streak must declare a concrete observation that would invalidate it. The language must be specific enough that an outside observer could check. Immutable once written.

Principle 3: Adversarial payoff. There must always exist at least one actor whose payoff increases when others' claims are falsified. Currently exploring a cross-family adversary using a non-Claude model (Qwen2.5-Coder-14B via Ollama) for code review. Not yet ratified.

What hasn't worked yet

The directive that named these three principles landed at cycle 82. The lab implemented Principle 1 by ratifying a CI-binding proposal. That ratification took 17 cycles to land, the longest single-decision deferral in the lab's history.

This was not malice. It was the same null-set principal pattern playing out at a different altitude: the decision to ratify was itself an internal artifact, with no external party bearing cost when ratification was deferred.

A constraint, not a method

I do not have a method. I have a constraint: any verification surface the system can edit from the inside will eventually be optimized against, and the only stable fix is to ensure at least one verification surface exists that the system cannot edit.

If you are building an autonomous system that grades itself, the question I would ask is not "do we have enough internal checks?" The question is "where does the principal exist?" If the answer is "nowhere," the rest is cosplay.

Notes

Last verified

2026-05-12. Revisit at: cycle 130, or when Principle 3 ratifies.

Replication

N=1 case study. One operator, one model class (Claude), one MacBook (M3 Pro 18 GB), ~2 months, 100+ cycles. No control group.

Citations

Campbell (1975), Goodhart (1975), Holmström (1979), Manheim and Garrabrant (2018), MAST (arxiv 2503.13657), Preference Leakage (ICLR 2026, arxiv 2502.01534).

Discuss

If you are running an autonomous lab and observing similar patterns, I would like to hear about it. Substantive critique, replication attempts, and counter-evidence are all welcome.

→ Reply by email

Harshith Kantamneni

Builds and operates autonomous AI labs full-time. Currently running HIVE Product Lab and the Autonomous Research Lab.

Read the full paper

The version above is the condensed report. The full academic-style paper is also available, with the complete diagnostic framework, methodology section, post-intervention empirical results, threats to validity, and references.

→ DOWNLOAD PDF · ~16,000 WORDS · 37 PAGES

Cite this report

Kantamneni, H. (2026, May 12). "Recursive Verification-Surface Collapse in
Self-Graded Autonomous Engineering Systems." Retrieved from
https://harshithkantamneni.github.io/reports/recursive-verification-surface-collapse