Reasoning Can Hurt the Inductive Abilities of Large Language Models
Haibo Jin, Peiyan Zhang, Man Luo, Haohan Wang

TL;DR
This paper reveals that chain-of-thought prompting can impair inductive reasoning in large language models, and proposes structured interventions to improve their performance without retraining.
Contribution
It provides a theoretical framework explaining how reasoning can introduce errors and introduces interventions to enhance inductive reasoning in LLMs.
Findings
CoT can degrade inductive performance in LLMs.
Structured interventions improve reasoning accuracy.
Error amplification occurs through specific failure modes.
Abstract
Large Language Models (LLMs) have shown remarkable progress across domains, yet their ability to perform inductive reasoning - inferring latent rules from sparse examples - remains limited. It is often assumed that chain-of-thought (CoT) prompting, as used in Large Reasoning Models (LRMs), enhances such reasoning. We investigate this assumption with creating four controlled, diagnostic game-based tasks - chess, Texas Hold'em, dice games, and blackjack - with hidden human-defined rules. We find that CoT reasoning can degrade inductive performance, with LRMs often underperforming their non-reasoning counterparts. To explain this, we present a theoretical framework that reveals how reasoning steps can amplify error through three failure modes: incorrect sub-task decomposition, incorrect sub-task solving, and incorrect final answer summarization. Based on our theoretical and empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
