Compute-Optimal Scaling for Value-Based Deep RL

Preston Fu; Oleh Rybkin; Zhiyuan Zhou; Michal Nauman; Pieter Abbeel; Sergey Levine; Aviral Kumar

arXiv:2508.14881·cs.LG·August 26, 2025

Compute-Optimal Scaling for Value-Based Deep RL

Preston Fu, Oleh Rybkin, Zhiyuan Zhou, Michal Nauman, Pieter Abbeel, Sergey Levine, Aviral Kumar

PDF

Open Access

TL;DR

This paper explores how to optimally allocate compute resources in value-based deep reinforcement learning by analyzing the interplay between model size, batch size, and update-to-data ratio, introducing the concept of TD-overfitting.

Contribution

It provides a theoretical framework and practical guidelines for compute-optimal scaling in deep RL, highlighting the phenomenon of TD-overfitting and its implications.

Findings

01

Large models are less affected by batch size increases, enabling more efficient scaling.

02

TD-overfitting occurs in small models, reducing Q-function accuracy with larger batches.

03

Guidelines for balancing model capacity and update frequency to maximize compute efficiency.

Abstract

As models grow larger and training them becomes expensive, it becomes increasingly important to scale training recipes not just to larger models and more data, but to do so in a compute-optimal manner that extracts maximal performance per unit of compute. While such scaling has been well studied for language modeling, reinforcement learning (RL) has received less attention in this regard. In this paper, we investigate compute scaling for online, value-based deep RL. These methods present two primary axes for compute allocation: model capacity and the update-to-data (UTD) ratio. Given a fixed compute budget, we ask: how should resources be partitioned across these axes to maximize sample efficiency? Our analysis reveals a nuanced interplay between model size, batch size, and UTD. In particular, we identify a phenomenon we call TD-overfitting: increasing the batch quickly harms Q-function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification