Is Value Functions Estimation with Classification Plug-and-play for   Offline Reinforcement Learning?

Denis Tarasov; Kirill Brilliantov; Dmitrii Kharlapenko

arXiv:2406.06309·cs.LG·November 19, 2024

Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?

Denis Tarasov, Kirill Brilliantov, Dmitrii Kharlapenko

PDF

Open Access 1 Repo

TL;DR

This paper empirically investigates replacing value function regression with classification in offline RL, revealing mixed impacts on performance across various algorithms and tasks, highlighting potential benefits and risks.

Contribution

It provides a large-scale empirical analysis of classification-based value estimation in offline RL, offering insights into when it improves or harms performance.

Findings

01

Classification can outperform regression in some tasks and algorithms.

02

Inconsistent effects observed across different algorithms and domains.

03

Highlights the importance of task and algorithm considerations when using classification for value functions.

Abstract

In deep Reinforcement Learning (RL), value functions are typically approximated using deep neural networks and trained via mean squared error regression objectives to fit the true value functions. Recent research has proposed an alternative approach, utilizing the cross-entropy classification objective, which has demonstrated improved performance and scalability of RL algorithms. However, existing study have not extensively benchmarked the effects of this replacement across various domains, as the primary objective was to demonstrate the efficacy of the concept across a broad spectrum of tasks, without delving into in-depth analysis. Our work seeks to empirically investigate the impact of such a replacement in an offline RL setup and analyze the effects of different aspects on performance. Through large-scale experiments conducted across a diverse range of tasks using different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dt6a/clorl
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovation Diffusion and Forecasting · Evolutionary Algorithms and Applications · Reinforcement Learning in Robotics