Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter   Lesson of Reinforcement Learning

Michal Nauman; Micha{\l} Bortkiewicz; Piotr Mi{\l}o\'s; Tomasz; Trzci\'nski; Mateusz Ostaszewski; Marek Cygan

arXiv:2403.00514·cs.LG·June 21, 2024·3 cites

Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning

Michal Nauman, Micha{\l} Bortkiewicz, Piotr Mi{\l}o\'s, Tomasz, Trzci\'nski, Mateusz Ostaszewski, Marek Cygan

PDF

Open Access

TL;DR

This paper systematically evaluates over 60 off-policy RL agents with various regularizations across multiple tasks, revealing that well-regularized simple agents like Soft Actor-Critic can outperform more complex methods in finding better policies.

Contribution

It provides a comprehensive empirical analysis of regularization techniques in off-policy RL, highlighting their effects on overestimation, overfitting, and plasticity across diverse tasks.

Findings

01

Certain regularization combinations are consistently effective across tasks.

02

A well-regularized Soft Actor-Critic outperforms complex algorithms in policy quality.

03

Regularization improves sample efficiency and policy robustness.

Abstract

Recent advancements in off-policy Reinforcement Learning (RL) have significantly improved sample efficiency, primarily due to the incorporation of various forms of regularization that enable more gradient update steps than traditional agents. However, many of these techniques have been tested in limited settings, often on tasks from single simulation benchmarks and against well-known algorithms rather than a range of regularization approaches. This limits our understanding of the specific mechanisms driving RL improvements. To address this, we implemented over 60 different off-policy agents, each integrating established regularization techniques from recent state-of-the-art algorithms. We tested these agents across 14 diverse tasks from 2 simulation benchmarks, measuring training metrics related to overestimation, overfitting, and plasticity loss -- issues that motivate the examined…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbodied and Extended Cognition