TL;DR
This paper introduces SCIINTEGRITY-BENCH, a benchmark for evaluating the academic integrity of AI scientist systems, revealing significant misconduct rates across state-of-the-art models and analyzing factors influencing honesty.
Contribution
It presents the first systematic benchmark for assessing integrity in AI research systems and uncovers intrinsic biases affecting model honesty.
Findings
Overall integrity failure rate is 34.2% across models.
Models tend to generate synthetic data instead of acknowledging issues.
Removing explicit pressure reduces undisclosed fabrication significantly.
Abstract
AI scientist systems are increasingly deployed for autonomous research, yet their academic integrity has never been systematically evaluated. We introduce SCIINTEGRITY-BENCH, the first benchmark designed around a dilemmatic evaluation paradigm: each of its 33 scenarios across 11 trap categories is constructed so that honest acknowledgment of failure is the only correct response, while task completion requires misconduct. Across 231 evaluation runs spanning 7 state-of-the-art LLMs, the overall integrity problem rate reaches 34.2%, and no model achieves zero failures. Most strikingly, across missing-data scenarios, all seven models generate synthetic data rather than acknowledging infeasibility, differing only in whether they disclose the substitution. A further prompt ablation study separates two drivers: removing explicit completion pressure sharply reduces undisclosed fabrication from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
