Same-family LLM judge bias is real · Harshith Kantamneni

ICLR 2026 published a paper called Preference Leakage: A Contamination Problem in LLM-as-a-judge (arxiv 2502.01534). The finding, in plain English: when an LLM judges another LLM's output, and the two share training distribution, the judge measurably prefers the related model. Same model, inheritance relationship, or just same family is enough to produce the bias.

I run a multi-agent ratification gate that uses four separate Claude Opus instances (chief architect, CPO, red team, applied researcher) reviewing META-altitude proposals. The gate is 4 of 4 NON-REJECT to ratify. The intuition behind the design was diversity through structural role differentiation, not model diversity. The four agents see the proposal from different angles: architecture, product, adversarial, research.

The ICLR 2026 paper makes me less confident that's enough. Even with different roles, all four are Claude. They share training distribution, format preferences, vocabulary biases, and probably the same blind spots about Claude-shaped reasoning. Empirically, when proposals get split votes, the dissent tends to be on framing details, not on whether the proposal is sound. Convergent verdicts on substance with role-flavored decoration on top.

The fix is structural: at least one verifier in the ratification chain needs to be cross-family. Currently exploring Qwen2.5-Coder-14B via Ollama on the same MacBook. Different training distribution, different vocabulary biases, different blind spots. Whether it's strong enough to be a real adversary on subtle judgment calls is what the next few cycles are about.

The general principle: same-family is a different problem from sample-size. Adding more Claude instances to a Claude review board does not give you cross-family diversity. It gives you the same review board with more chairs.