MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models
Jiachun Li, Pengfei Cao, Zhuoran Jin, Yubo Chen, Kang Liu, Jun Zhao

TL;DR
MIRAGE introduces a synthetic dataset to evaluate LLMs' inductive reasoning, revealing their reliance on neighbor-based reasoning rather than rule-based generalization, and analyzing factors affecting their reasoning process.
Contribution
The paper presents MIRAGE, a comprehensive dataset for evaluating inductive reasoning in LLMs, and provides insights into their reasoning strategies and limitations.
Findings
LLMs often do not rely on correct rules for inductive reasoning.
Models tend to focus on similar observed facts near test examples.
Neighbor-based reasoning significantly improves deductive performance.
Abstract
Inductive reasoning is an essential capability for large language models (LLMs) to achieve higher intelligence, which requires the model to generalize rules from observed facts and then apply them to unseen examples. We present MIRAGE, a synthetic dataset that addresses the limitations of previous work, specifically the lack of comprehensive evaluation and flexible test data. In it, we evaluate LLMs' capabilities in both the inductive and deductive stages, allowing for flexible variation in input distribution, task scenario, and task difficulty to analyze the factors influencing LLMs' inductive reasoning. Based on these multi-faceted evaluations, we demonstrate that the LLM is a poor rule-based reasoner. In many cases, when conducting inductive reasoning, they do not rely on a correct rule to answer the unseen case. From the perspectives of different prompting methods, observation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsFocus
