Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning
Yansong Ning, Wei Li, Jun Fang, Naiqiang Tan, Hao Liu

TL;DR
This paper introduces Long×Short, a collaborative reasoning framework with two LLMs focusing on important and remaining thoughts, significantly reducing token length while maintaining reasoning performance.
Contribution
It proposes a novel multi-turn reinforcement learning approach for LLM collaboration, emphasizing thought importance and efficiency in long chain-of-thought reasoning.
Findings
Achieves over 80% token reduction across multiple benchmarks.
Maintains comparable reasoning performance to larger models.
Demonstrates effective collaboration between long-thought and short-thought LLMs.
Abstract
Compressing long chain-of-thought (CoT) from large language models (LLMs) is an emerging strategy to improve the reasoning efficiency of LLMs. Despite its promising benefits, existing studies equally compress all thoughts within a long CoT, hindering more concise and effective reasoning. To this end, we first investigate the importance of different thoughts by examining their effectiveness and efficiency in contributing to reasoning through automatic long CoT chunking and Monte Carlo rollouts. Building upon the insights, we propose a theoretically bounded metric to jointly measure the effectiveness and efficiency of different thoughts. We then propose LongShort, an efficient reasoning framework that enables two LLMs to collaboratively solve the problem: a long-thought LLM for more effectively generating important thoughts, while a short-thought LLM for efficiently generating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
