HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization

Chengyu Huang; Zhengxin Zhang; Claire Cardie

arXiv:2505.11225·cs.CL·November 18, 2025

HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization

Chengyu Huang, Zhengxin Zhang, Claire Cardie

PDF

Open Access 1 Repo 1 Video

TL;DR

HAPO is a training method that improves large language models' ability to produce concise, correct responses by leveraging historical information and a novel length reward during training.

Contribution

This paper introduces HAPO, a novel training approach that uses history-aware rewards to enhance LLMs' concise reasoning capabilities, outperforming prior methods.

Findings

01

Length reductions of 33-59% with minimal accuracy drops (2-5%)

02

Effective in improving reasoning efficiency across math benchmarks

03

Leverages history to guide models towards more concise solutions

Abstract

While scaling the length of responses at test-time has been shown to markedly improve the reasoning abilities and performance of large language models (LLMs), it often results in verbose outputs and increases inference cost. Prior approaches for efficient test-time scaling, typically using universal budget constraints or query-level length optimization, do not leverage historical information from previous encounters with the same problem during training. We hypothesize that this limits their ability to progressively make solutions more concise over time. To address this, we present History-Aware Policy Optimization (HAPO), which keeps track of a history state (e.g., the minimum length over previously generated correct responses) for each problem. HAPO employs a novel length reward function based on this history state to incentivize the discovery of correct solutions that are more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hcy123902/hapo
pytorchOfficial

Videos

HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications