Bigger, Regularized, Optimistic: scaling for compute and   sample-efficient continuous control

Michal Nauman; Mateusz Ostaszewski; Krzysztof Jankowski; Piotr; Mi{\l}o\'s; Marek Cygan

arXiv:2405.16158·cs.LG·December 4, 2024·1 cites

Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control

Michal Nauman, Mateusz Ostaszewski, Krzysztof Jankowski, Piotr, Mi{\l}o\'s, Marek Cygan

PDF

Open Access 1 Repo

TL;DR

This paper shows that scaling model capacity combined with regularization and optimistic exploration can significantly improve sample efficiency and performance in continuous control reinforcement learning tasks.

Contribution

It introduces the BRO algorithm, which leverages strong regularization and scaling to achieve state-of-the-art results in complex RL benchmarks.

Findings

01

BRO outperforms existing algorithms on 40 tasks

02

Achieves near-optimal policies in challenging tasks

03

Scaling with regularization enhances RL performance

Abstract

Sample efficiency in Reinforcement Learning (RL) has traditionally been driven by algorithmic enhancements. In this work, we demonstrate that scaling can also lead to substantial improvements. We conduct a thorough investigation into the interplay of scaling model capacity and domain-specific RL enhancements. These empirical findings inform the design choices underlying our proposed BRO (Bigger, Regularized, Optimistic) algorithm. The key innovation behind BRO is that strong regularization allows for effective scaling of the critic networks, which, paired with optimistic exploration, leads to superior performance. BRO achieves state-of-the-art results, significantly outperforming the leading model-based and model-free algorithms across 40 complex tasks from the DeepMind Control, MetaWorld, and MyoSuite benchmarks. BRO is the first model-free algorithm to achieve near-optimal policies in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naumix/BiggerRegularizedOptimistic
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Control Systems and Identification · Neural Networks and Applications