CLORE: Content-Level Optimization for Reasoning Efficiency
Yuyang Wu, Qiyao Xue, Guanxing Lu, Weichen Liu, Zihan Wang, Manling Li, Olexandr Isayev

TL;DR
CLORE is a framework that enhances reasoning efficiency in large language models by editing and removing unnecessary or repetitive content from reasoning traces without altering the final answer.
Contribution
It introduces a content-level optimization method that uses an external model to delete irrelevant or redundant reasoning segments, improving efficiency and content quality.
Findings
CLORE improves accuracy--efficiency trade-off across multiple benchmarks.
It reduces repetitive reasoning and illegible content.
Compatible with several existing training methods.
Abstract
Reinforcement learning post-training has improved the reasoning ability of large language models, but often produces unnecessarily long, repetitive, or semantically opaque reasoning traces. Existing efficient reasoning methods mainly regulate response length through explicit budgets or length-aware rewards, leaving intermediate reasoning content weakly supervised. We propose CLORE, a content-level optimization framework that improves reasoning efficiency by editing correct on-policy rollouts. CLORE uses an external augmentation model to delete repetitive segments, illegible or task-irrelevant content, and superfluous reasoning after the solution is established, while preserving the final answer. The resulting augmented--original pairs are optimized with an auxiliary reference-free DPO objective alongside standard policy-gradient training. By restricting augmentation to correct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
