The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research
Xiaoyan Bai, Alexander Baumgartner, Haojia Sun, Ari Holtzman, Chenhao Tan

TL;DR
This paper introduces an execution-grounded evaluation framework for mechanistic interpretability research, using AI agents to verify reproducibility, coherence, and generalizability of scientific outputs beyond traditional narrative reviews.
Contribution
It develops the first automated, execution-based evaluation framework and MechEvalAgent for assessing research rigor, addressing scalability and reproducibility challenges in scientific review.
Findings
Achieves over 80% agreement with human judges
Identifies significant methodological issues in research outputs
Detects 51 additional issues missed by human reviewers
Abstract
Reproducibility crises across sciences highlight the limitations of the paper-centric review system in assessing the rigor and reproducibility of research. AI agents that autonomously design and generate large volumes of research outputs exacerbate these challenges. In this work, we address the growing challenges of scalability and rigor by flipping the dynamic and developing AI agents as research evaluators. We propose the first execution-grounded evaluation framework that verifies research beyond narrative review by examining code and data alongside the paper. We use mechanistic interpretability research as a testbed, build standardized research output, and develop MechEvalAgent, an automated evaluation framework that assesses the coherence of the experimental process, the reproducibility of results, and the generalizability of findings. We show that our framework achieves above 80%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Topic Modeling
