InductionBench: LLMs Fail in the Simplest Complexity Class
Wenyue Hua, Tyler Wong, Sun Fei, Liangming Pan, Adam Jardine, William Yang Wang

TL;DR
InductionBench is a new benchmark that reveals current large language models struggle with basic inductive reasoning tasks, exposing a significant gap in their reasoning capabilities beyond deductive tasks.
Contribution
The paper introduces InductionBench, the first benchmark specifically designed to evaluate inductive reasoning in LLMs, highlighting their limitations in simple inductive tasks.
Findings
LLMs perform poorly on basic inductive reasoning tasks
Current models excel in deductive reasoning but not in inductive reasoning
Inductive reasoning remains a challenging area for state-of-the-art LLMs
Abstract
Large language models (LLMs) have shown remarkable improvements in reasoning and many existing benchmarks have been addressed by models such as o1 and o3 either fully or partially. However, a majority of these benchmarks emphasize deductive reasoning, including mathematical and coding tasks in which rules such as mathematical axioms or programming syntax are clearly defined, based on which LLMs can plan and apply these rules to arrive at a solution. In contrast, inductive reasoning, where one infers the underlying rules from observed data, remains less explored. Such inductive processes lie at the heart of scientific discovery, as they enable researchers to extract general principles from empirical observations. To assess whether LLMs possess this capacity, we introduce InductionBench, a new benchmark designed to evaluate the inductive reasoning ability of LLMs. Our experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Text Readability and Simplification · Topic Modeling
