The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research

Xiaoyan Bai; Alexander Baumgartner; Haojia Sun; Ari Holtzman; Chenhao Tan

arXiv:2602.18458·cs.CY·February 24, 2026

The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research

Xiaoyan Bai, Alexander Baumgartner, Haojia Sun, Ari Holtzman, Chenhao Tan

PDF

Open Access

TL;DR

This paper introduces an execution-grounded evaluation framework for mechanistic interpretability research, using AI agents to verify reproducibility, coherence, and generalizability of scientific outputs beyond traditional narrative reviews.

Contribution

It develops the first automated, execution-based evaluation framework and MechEvalAgent for assessing research rigor, addressing scalability and reproducibility challenges in scientific review.

Findings

01

Achieves over 80% agreement with human judges

02

Identifies significant methodological issues in research outputs

03

Detects 51 additional issues missed by human reviewers

Abstract

Reproducibility crises across sciences highlight the limitations of the paper-centric review system in assessing the rigor and reproducibility of research. AI agents that autonomously design and generate large volumes of research outputs exacerbate these challenges. In this work, we address the growing challenges of scalability and rigor by flipping the dynamic and developing AI agents as research evaluators. We propose the first execution-grounded evaluation framework that verifies research beyond narrative review by examining code and data alongside the paper. We use mechanistic interpretability research as a testbed, build standardized research output, and develop MechEvalAgent, an automated evaluation framework that assesses the coherence of the experimental process, the reproducibility of results, and the generalizability of findings. We show that our framework achieves above 80%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Topic Modeling