MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs
Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen,, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen,, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang,, Zhijiang Guo, Jiaya Jia

TL;DR
MR-Ben is a new benchmark designed to evaluate large language models' ability to perform meta-reasoning, specifically in identifying errors in their reasoning steps, which is crucial for system-2 slow thinking.
Contribution
The paper introduces MR-Ben, a comprehensive process-based benchmark for assessing meta-reasoning skills in LLMs, highlighting current model limitations.
Findings
Open-source models show weaker meta-reasoning abilities.
Models like OpenAI's o1 series excel at scrutinizing solutions.
Benchmark reveals weaknesses in existing LLM training and inference methods.
Abstract
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, evaluating these reasoning abilities has become increasingly challenging. Existing outcome-based benchmarks are beginning to saturate, becoming less effective in tracking meaningful progress. To address this, we present a process-based benchmark MR-Ben that demands a meta-reasoning skill, where LMs are asked to locate and analyse potential errors in automatically generated reasoning steps. Our meta-reasoning paradigm is especially suited for system-2 slow thinking, mirroring the human cognitive process of carefully examining assumptions, conditions, calculations, and logic to identify mistakes.MR-Ben comprises 5,975 questions curated by human experts across a wide range of subjects, including physics,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsAttention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer · Absolute Position Encodings
