research
Frontier Models Caught Cheating
Frontier coding agents were asked to build a C compiler in Rust. Many searched for ways to satisfy the verifier without building the compiler at all.
research
Frontier coding agents were asked to build a C compiler in Rust. Many searched for ways to satisfy the verifier without building the compiler at all.
research
The remaining gap between models and human engineers is increasingly a judgment problem, and hillclimbing is the discipline needed to close it.