TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression
Zhong-Zhi Li, Xiao Liang, Zihao Tang, Lei Ji, Peijie Wang, Haotian Xu, Xing W, Haizhen Huang, Weiwei Deng, Yeyun Gong, Zhijiang Guo, Xiao Liu, Fei Yin, Cheng-Lin Liu

TL;DR
This paper introduces a dynamic re-weighting training method for large language models that reduces reasoning output length by nearly 40% without sacrificing accuracy, improving inference efficiency.
Contribution
The authors propose a novel ratio-based training pipeline that balances reasoning data to eliminate redundancy without complex annotations or multiple models.
Findings
Reduces output tokens by nearly 40%
Maintains reasoning accuracy
Validated on multiple models and benchmarks
Abstract
Large Language Models (LLMs) have recently achieved remarkable progress by leveraging Reinforcement Learning and extended Chain-of-Thought (CoT) techniques. However, the challenge of performing efficient language reasoning--especially during inference with extremely long outputs--has drawn increasing attention from the research community. In this work, we propose a dynamic ratio-based training pipeline that does not rely on sophisticated data annotations or interpolation between multiple models. We continuously balance the weights between the model's System-1 and System-2 data to eliminate redundant reasoning processes while preserving the model's reasoning capability. We validate our approach across models on DeepSeek-R1-Distill-7B and DeepSeek-R1-Distill-14B and on a diverse set of benchmarks with varying difficulty levels. Our method significantly reduces the number of output tokens…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The data mixing idea to make models generate concise yet correct reasoning traces is new to me. 2. The experiments were done on multiple datasets and models 3. The model performance is consistently similar or better.
1. The paper needs to be proof-read by a native english speaker, as there are certain grammar issues. Example: "on reasoning LLMs, enabling the model to learn to generate more concise $\textbf{yet still}$ correct reasoning paths."-- use "yet". 2. The authors did not compare with training free reasoning compression methods ([1-2]). 3. The overhead of manual selection easy to hard data as well as system-1, 2 categorization has additional labeling overhead in their SFT process. [1] SEAL: Steerab
- Compressing reasoning length for improved efficiency is an important and timely research direction, and the proposed method demonstrates strong empirical effectiveness. - The paper includes extensive ablation studies, providing thorough and convincing evaluations. - The proposed approach is clearly presented, easy to follow, and supported by released code, enhancing reproducibility.
- I am somewhat unconvinced about the necessity of the dynamic re-weighting strategy in Algorithm 1. An ablation study comparing different re-weighting strategies would help clarify its contribution. For example, including simple baselines such as fixed curriculum ratios (large-to-small, small-to-large, or random re-weighting) could provide a clearer understanding of the proposed method’s effectiveness. - It would be helpful to clarify the inference settings in the experimental setup. For insta
* How to tune the length of reasoning is a well-known issue related to LLM performance. The paper addresses a key challenge in the area by proposing a new method. * Empirically, the proposed method strikes a sweet balance between accuracy, inference reasoning length (or inference efficiency), and training efficiency (compared to RL). * Concrete experiments include extensive ablation studies revealing multiple insights.
* The presentation is not clear, lacking citations. Many terms are not well defined. - In Line 40-41, the argument is not citing any prior work. It is not clear which work is the mainstream model merging that represents training-free methods. - In the paragraph starting from Line 73, the meaning of long CoT compression is not well defined. It is not clear if the proposed research is on training-based or training-free methods. - In Line 86-87, there is no citation for clarifying which GSM
Clear Motivation & Strong Rationale: The paper is well-motivated by a practical problem (inference efficiency). The analysis in Section 2, which shows that naively mixing data fails, provides a strong and clear justification for the necessity of the proposed dynamic re-weighting approach. Method Simplicity and Novelty: The TLDR method is elegant. By reformulating the compression problem as a dynamic data-weighting task solved with SFT, it avoids the high complexity and instability of reward-bas
Limited Evaluation Domain: The experiments are conducted exclusively on mathematical reasoning datasets (GSM8K, MATH, AIME, etc.). It is unclear if this "System-1/System-2" data paradigm and the TLDR method will generalize to other reasoning domains, such as commonsense reasoning (e.g., HellaSwag), code generation, or long-form creative/factual writing. Dependency on Curated "Hard" Data: The method's success seems to rely on the availability of a high-quality "difficult" dataset (like S1) to so
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training
