In-Token Rationality Optimization: Towards Accurate and Concise LLM Reasoning via Self-Feedback

Mingye Zhu; Yi Liu; Zheren Fu; Quan Wang; Yongdong Zhang

arXiv:2511.09865·cs.CL·November 14, 2025

In-Token Rationality Optimization: Towards Accurate and Concise LLM Reasoning via Self-Feedback

Mingye Zhu, Yi Liu, Zheren Fu, Quan Wang, Yongdong Zhang

PDF

Open Access 1 Video

TL;DR

InTRO introduces a token-level exploration and self-feedback framework for training LLMs, significantly improving reasoning accuracy and conciseness across multiple benchmarks and demonstrating strong generalization capabilities.

Contribution

The paper proposes InTRO, a novel method that leverages token-wise importance weights for better reasoning, addressing limitations of previous supervised and reinforcement learning approaches.

Findings

01

Outperforms baselines with up to 20% accuracy improvement

02

Produces more concise and less verbose rationales

03

Successfully transfers reasoning skills to out-of-domain tasks

Abstract

Training Large Language Models (LLMs) for chain-of-thought reasoning presents a significant challenge: supervised fine-tuning on a single "golden" rationale hurts generalization as it penalizes equally valid alternatives, whereas reinforcement learning with verifiable rewards struggles with credit assignment and prohibitive computational cost. To tackle these limitations, we introduce InTRO (In-Token Rationality Optimization), a new framework that enables both token-level exploration and self-feedback for accurate and concise reasoning. Instead of directly optimizing an intractable objective over all valid reasoning paths, InTRO leverages correction factors-token-wise importance weights estimated by the information discrepancy between the generative policy and its answer-conditioned counterpart, for informative next token selection. This approach allows the model to perform token-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

In-Token Rationality Optimization: Towards Accurate and Concise LLM Reasoning via Self-Feedback· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks