In-Token Rationality Optimization: Towards Accurate and Concise LLM Reasoning via Self-Feedback
Mingye Zhu, Yi Liu, Zheren Fu, Quan Wang, Yongdong Zhang

TL;DR
InTRO introduces a token-level exploration and self-feedback framework for training LLMs, significantly improving reasoning accuracy and conciseness across multiple benchmarks and demonstrating strong generalization capabilities.
Contribution
The paper proposes InTRO, a novel method that leverages token-wise importance weights for better reasoning, addressing limitations of previous supervised and reinforcement learning approaches.
Findings
Outperforms baselines with up to 20% accuracy improvement
Produces more concise and less verbose rationales
Successfully transfers reasoning skills to out-of-domain tasks
Abstract
Training Large Language Models (LLMs) for chain-of-thought reasoning presents a significant challenge: supervised fine-tuning on a single "golden" rationale hurts generalization as it penalizes equally valid alternatives, whereas reinforcement learning with verifiable rewards struggles with credit assignment and prohibitive computational cost. To tackle these limitations, we introduce InTRO (In-Token Rationality Optimization), a new framework that enables both token-level exploration and self-feedback for accurate and concise reasoning. Instead of directly optimizing an intractable objective over all valid reasoning paths, InTRO leverages correction factors-token-wise importance weights estimated by the information discrepancy between the generative policy and its answer-conditioned counterpart, for informative next token selection. This approach allows the model to perform token-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
