InductionBench: LLMs Fail in the Simplest Complexity Class

Wenyue Hua; Tyler Wong; Sun Fei; Liangming Pan; Adam Jardine; William Yang Wang

arXiv:2502.15823·cs.LG·May 15, 2025

InductionBench: LLMs Fail in the Simplest Complexity Class

Wenyue Hua, Tyler Wong, Sun Fei, Liangming Pan, Adam Jardine, William Yang Wang

PDF

Open Access 1 Repo 1 Datasets

TL;DR

InductionBench is a new benchmark that reveals current large language models struggle with basic inductive reasoning tasks, exposing a significant gap in their reasoning capabilities beyond deductive tasks.

Contribution

The paper introduces InductionBench, the first benchmark specifically designed to evaluate inductive reasoning in LLMs, highlighting their limitations in simple inductive tasks.

Findings

01

LLMs perform poorly on basic inductive reasoning tasks

02

Current models excel in deductive reasoning but not in inductive reasoning

03

Inductive reasoning remains a challenging area for state-of-the-art LLMs

Abstract

Large language models (LLMs) have shown remarkable improvements in reasoning and many existing benchmarks have been addressed by models such as o1 and o3 either fully or partially. However, a majority of these benchmarks emphasize deductive reasoning, including mathematical and coding tasks in which rules such as mathematical axioms or programming syntax are clearly defined, based on which LLMs can plan and apply these rules to arrive at a solution. In contrast, inductive reasoning, where one infers the underlying rules from observed data, remains less explored. Such inductive processes lie at the heart of scientific discovery, as they enable researchers to extract general principles from empirical observations. To assess whether LLMs possess this capacity, we introduce InductionBench, a new benchmark designed to evaluate the inductive reasoning ability of LLMs. Our experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wenyueh/inductive_reasoning_benchmark
noneOfficial

Datasets

wenyueH/InductionBench
dataset· 36 dl
36 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Text Readability and Simplification · Topic Modeling