TEMPO: Scaling Test-time Training for Large Reasoning Models

Qingyang Zhang; Xinke Kong; Haitao Wu; Qinghua Hu; Minghao Wu; Baosong Yang; Yu Cheng; Yun Luo; Ganqu Cui; Changqing Zhang

arXiv:2604.19295·cs.LG·April 22, 2026

TEMPO: Scaling Test-time Training for Large Reasoning Models

Qingyang Zhang, Xinke Kong, Haitao Wu, Qinghua Hu, Minghao Wu, Baosong Yang, Yu Cheng, Yun Luo, Ganqu Cui, Changqing Zhang

PDF

1 Repo

TL;DR

TEMPO introduces an EM-based test-time training framework for large reasoning models, enabling sustained improvements and diversity by alternating policy refinement with critic recalibration.

Contribution

It formalizes the TTT process with EM, reintroduces the recalibration step, and demonstrates significant performance gains on reasoning tasks.

Findings

01

TEMPO improves OLMO3-7B accuracy from 33.0% to 51.1%.

02

TEMPO enhances Qwen3-14B accuracy from 42.3% to 65.8%.

03

The method maintains high diversity during training.

Abstract

Test-time training (TTT) adapts model parameters on unlabeled test instances during inference time, which continuously extends capabilities beyond the reach of offline training. Despite initial gains, existing TTT methods for LRMs plateau quickly and do not benefit from additional test-time compute. Without external calibration, the self-generated reward signal increasingly drifts as the policy model evolves, leading to both performance plateaus and diversity collapse. We propose TEMPO, a TTT framework that interleaves policy refinement on unlabeled questions with periodic critic recalibration on a labeled dataset. By formalizing this alternating procedure through the Expectation-Maximization (EM) algorithm, we reveal that prior methods can be interpreted as incomplete variants that omit the crucial recalibration step. Reintroducing this step tightens the evidence lower bound (ELBO) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qingyangzhang/TEMPO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.