Selective rigor in agentic software reconstruction

thesis-review · examiner-tested

An anonymized case study of a frontier model commissioned to audit and rebuild a single-developer system, then run past two independent adversarial examiners who both returned major revisions, conceded point by point.

P0–P1defects closed2adversarial examinersmajorrevisions conceded

Findings