TQL: Scaling Q-Functions with Transformers by Preventing Attention Collapse

Perry Dong; Kuo-Han Hung; Alexander Swerdlow; Dorsa Sadigh; Chelsea Finn

arXiv:2602.01439·cs.LG·February 3, 2026

TQL: Scaling Q-Functions with Transformers by Preventing Attention Collapse

Perry Dong, Kuo-Han Hung, Alexander Swerdlow, Dorsa Sadigh, Chelsea Finn

PDF

Open Access

TL;DR

This paper introduces TQL, a method that stabilizes transformer-based value functions in reinforcement learning by preventing attention score collapse through entropy control, enabling effective scaling and significant performance improvements.

Contribution

The paper identifies attention score collapse as a key obstacle in scaling transformers for RL value functions and proposes entropy-based control to stabilize training and improve performance.

Findings

01

Up to 43% performance improvement with larger models

02

Attention scores collapse as model capacity increases

03

Entropy control stabilizes transformer training in RL

Abstract

Despite scale driving substantial recent advancements in machine learning, reinforcement learning (RL) methods still primarily use small value functions. Naively scaling value functions -- including with a transformer architecture, which is known to be highly scalable -- often results in learning instability and worse performance. In this work, we ask what prevents transformers from scaling effectively for value functions? Through empirical analysis, we identify the critical failure mode in this scaling: attention scores collapse as capacity increases. Our key insight is that we can effectively prevent this collapse and stabilize training by controlling the entropy of the attention scores, thereby enabling the use of larger models. To this end, we propose Transformer Q-Learning (TQL), a method that unlocks the scaling potential of transformers in learning value functions in RL. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning