Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs

Jingyao Wang; Wenwen Qiang; Zeen Song; Changwen Zheng; Hui Xiong

arXiv:2505.10425·cs.LG·October 16, 2025

Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs

Jingyao Wang, Wenwen Qiang, Zeen Song, Changwen Zheng, Hui Xiong

PDF

Open Access 1 Video

TL;DR

L2T is a reinforcement learning framework that enhances large language models' reasoning abilities by optimizing token efficiency through information gain-based rewards, without requiring extra annotations.

Contribution

It introduces a universal, information-theoretic reward for reinforcement fine-tuning LLMs, improving reasoning effectiveness and efficiency without additional task-specific data.

Findings

01

Boosts reasoning effectiveness across benchmarks

02

Reduces token usage in reasoning chains

03

Achieves efficient model updates with theoretical guarantees

Abstract

Large language models (LLMs) excel at complex tasks thanks to advances in their reasoning abilities. However, existing methods overlook the trade-off between reasoning effectiveness and efficiency, often encouraging unnecessarily long reasoning chains and wasting tokens. To address this, we propose Learning to Think (L2T), an information-theoretic reinforcement fine-tuning framework for LLMs to make the models achieve optimal reasoning with fewer tokens. Specifically, L2T treats each query-response interaction as a hierarchical session of multiple episodes and proposes a universal dense process reward, i.e., quantifies the episode-wise information gain in parameters, requiring no extra annotations or task-specific evaluators. We propose a method to quickly estimate this reward based on PAC-Bayes bounds and the Fisher information matrix. Theoretical analyses show that it significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs· slideslive

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications

MethodsBalanced Selection