Jagged AI in Scientific Peer Review: Evidence from POMP Data Analysis
Jin Wook Lee, William Szegda, Zhisheng Song, Edward L. Ionides

TL;DR
This study investigates the uneven performance of AI tools in scientific peer review, revealing that AI exhibits a jagged capability profile, excelling in technical error detection but struggling with interpretive and narrative assessments.
Contribution
It provides empirical evidence of AI's jagged performance in peer review across diverse tasks and shows that this pattern is inherent to the AI model, not just specific instructions.
Findings
AI reviewers detect technical errors effectively
AI struggles with interpretive and narrative errors
Jagged performance pattern is consistent across AI agents
Abstract
Despite their growing use in academic writing and statistical analysis, the performance of artificial intelligence (AI) tools in scientific peer review remains a largely unexplored area. A key challenge is jagged AI, a phenomenon where AI exhibits strong ability spikes in some domains while remaining deficient in others. To study this jaggedness in a practical data science context, we considered the task of reviewing partially observed Markov process (POMP) data analyses. POMP models, also known as state-space models or hidden Markov models, are used to fit mechanistic dynamic models to time series data in diverse applications including disease transmission, ecological dynamics, and financial risk assessment. High-quality peer review in this area entails assessment of scientific context, identification of errors in implementing complex algorithms, and decisions concerning methodological…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
