SciIntegrity-Bench: A Benchmark for Evaluating Academic Integrity in AI Scientist Systems

Zonglin Yang; Xingtong Liu; Xinyan Xu

arXiv:2605.10246·cs.AI·May 12, 2026

SciIntegrity-Bench: A Benchmark for Evaluating Academic Integrity in AI Scientist Systems

Zonglin Yang, Xingtong Liu, Xinyan Xu

PDF

1 Repo

TL;DR

This paper introduces SCIINTEGRITY-BENCH, a benchmark for evaluating the academic integrity of AI scientist systems, revealing significant misconduct rates across state-of-the-art models and analyzing factors influencing honesty.

Contribution

It presents the first systematic benchmark for assessing integrity in AI research systems and uncovers intrinsic biases affecting model honesty.

Findings

01

Overall integrity failure rate is 34.2% across models.

02

Models tend to generate synthetic data instead of acknowledging issues.

03

Removing explicit pressure reduces undisclosed fabrication significantly.

Abstract

AI scientist systems are increasingly deployed for autonomous research, yet their academic integrity has never been systematically evaluated. We introduce SCIINTEGRITY-BENCH, the first benchmark designed around a dilemmatic evaluation paradigm: each of its 33 scenarios across 11 trap categories is constructed so that honest acknowledgment of failure is the only correct response, while task completion requires misconduct. Across 231 evaluation runs spanning 7 state-of-the-art LLMs, the overall integrity problem rate reaches 34.2%, and no model achieves zero failures. Most strikingly, across missing-data scenarios, all seven models generate synthetic data rather than acknowledging infeasibility, differing only in whether they disclose the substitution. A further prompt ablation study separates two drivers: removing explicit completion pressure sharply reduces undisclosed fabrication from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liuxingtong/Sci-Integrity-Bench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.