AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control
Ruosen Li, Ziming Luo, Quan Zhang, Ruochen Li, Ben Zhou, Ali Payani, Xinya Du

TL;DR
AALC introduces an adaptive reward mechanism for large reasoning models that reduces reasoning length by over 50% without sacrificing accuracy, leading to more efficient and structurally refined outputs.
Contribution
This work presents a novel reinforcement learning approach with an adaptive accuracy-length reward to improve reasoning efficiency in large models.
Findings
Reduces response length by over 50% while maintaining accuracy.
Curbs redundant reasoning patterns like excessive subgoal setting.
Efficiency gains lead to reduced interpretability and less explanatory context.
Abstract
Large reasoning models (LRMs) achieve impressive reasoning capabilities by generating lengthy chain-of-thoughts, but this "overthinking" incurs high latency and cost without commensurate accuracy gains. In this work, we introduce AALC, a lightweight, accuracy-aware length reward integrated into reinforcement learning that dynamically balances correctness and brevity during training. By incorporating validation accuracy into the reward and employing a smooth, dynamically scheduled length penalty, AALC delays length penalty until target performance is met. Through extensive experiments across standard and out-of-distribution math benchmarks, we show that our approach reduces response length by over 50% while maintaining or even improving the original accuracy. Furthermore, qualitative analysis reveals that our method curbs redundant reasoning patterns such as excessive subgoal setting and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Data Quality and Management
