Stop Regressing: Training Value Functions via Classification for   Scalable Deep RL

Jesse Farebrother; Jordi Orbay; Quan Vuong; Adrien Ali Ta\"iga; Yevgen; Chebotar; Ted Xiao; Alex Irpan; Sergey Levine; Pablo Samuel Castro,; Aleksandra Faust; Aviral Kumar; Rishabh Agarwal

arXiv:2403.03950·cs.LG·March 7, 2024·3 cites

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Ta\"iga, Yevgen, Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro,, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal

PDF

Open Access

TL;DR

This paper proposes replacing the traditional mean squared error regression loss with a categorical cross-entropy classification loss for training value functions in deep reinforcement learning, leading to improved scalability and performance across various domains.

Contribution

The authors demonstrate that using classification loss for value functions enhances scalability and performance in deep RL, outperforming regression-based methods in multiple challenging tasks.

Findings

01

Improved performance on Atari 2600 games with SoftMoEs.

02

State-of-the-art results in robotic manipulation with Q-transformers.

03

Successful application to Chess and Wordle with high-capacity Transformers.

Abstract

Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification