TL;DR
This paper investigates the reasoning controllability of large language models, revealing their tendency to prioritize sensibility over compliance, and demonstrates methods to steer models towards more compliant reasoning patterns.
Contribution
It provides the first systematic analysis of reasoning conflicts in LLMs and introduces techniques to improve instruction following by controlling reasoning types.
Findings
Models prioritize sensibility over compliance even with conflicting instructions.
Confidence scores drop during reasoning conflicts, indicating internal detectability.
Activation patterns encode reasoning types linearly, enabling controllability.
Abstract
Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are further elicited via Chain-of-Thought (CoT) practices. However, whether fundamental reasoning patterns, such as induction, deduction, and abduction, can be decoupled from specific problem instances remains a critical challenge for model controllability, and for shedding light on reasoning controllability. In this paper, we present the first systematic investigation of this problem through the lens of reasoning conflicts: an explicit tension between parametric and contextual information induced by mandating logical schemata that deviate from those expected for a target task. Our evaluation reveals that LLMs consistently prioritize sensibility over compliance, favoring task-appropriate reasoning patterns despite conflicting instructions. Notably, task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
