Improving Latent Generalization Using Test-time Compute

Arslan Chaudhry; Sridhar Thiagarajan; Andrew Lampinen

arXiv:2604.01430·cs.LG·April 3, 2026

Improving Latent Generalization Using Test-time Compute

Arslan Chaudhry, Sridhar Thiagarajan, Andrew Lampinen

PDF

TL;DR

This paper explores how test-time compute, specifically chain-of-thought prompting trained via reinforcement learning, enhances latent generalization in language models beyond traditional training methods.

Contribution

It introduces a reinforcement learning approach to teach models to use test-time reasoning, improving latent generalization and out-of-distribution performance.

Findings

01

Test-time thinking improves latent generalization on in-distribution tasks.

02

Models trained with RL-generated chains-of-thought outperform baselines on new knowledge.

03

Thinking models achieve above-chance performance on reversal tasks through generate-and-verify.

Abstract

Language Models (LMs) exhibit two distinct mechanisms for knowledge acquisition: in-weights learning (i.e., encoding information within the model weights) and in-context learning (ICL). Although these two modes offer complementary strengths, in-weights learning frequently struggles to facilitate deductive reasoning over the internalized knowledge. We characterize this limitation as a deficit in latent generalization, of which the reversal curse is one example. Conversely, in-context learning demonstrates highly robust latent generalization capabilities. To improve latent generalization from in-weights knowledge, prior approaches rely on train-time data augmentation, yet these techniques are task-specific, scale poorly, and fail to generalize to out-of-distribution knowledge. To overcome these shortcomings, this work studies how models can be taught to use test-time compute, or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.