Towards General-Purpose Model-Free Reinforcement Learning

Scott Fujimoto; Pierluca D'Oro; Amy Zhang; Yuandong Tian; Michael; Rabbat

arXiv:2501.16142·cs.LG·January 28, 2025

Towards General-Purpose Model-Free Reinforcement Learning

Scott Fujimoto, Pierluca D'Oro, Amy Zhang, Yuandong Tian, Michael, Rabbat

PDF

Open Access 3 Reviews

TL;DR

This paper introduces MR.Q, a model-free deep reinforcement learning algorithm that uses approximate linearization of the value function to perform well across diverse benchmarks with a single hyperparameter set, advancing towards general-purpose RL.

Contribution

The paper proposes MR.Q, a novel model-free RL method leveraging model-based representations for broad applicability without increased complexity or tuning.

Findings

01

MR.Q performs competitively across multiple RL benchmarks.

02

It uses a single hyperparameter set for diverse tasks.

03

The approach bridges the gap between model-based and model-free RL.

Abstract

Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice however, RL algorithms are often tailored to specific benchmarks, relying on carefully tuned hyperparameters and algorithmic choices. Recently, powerful model-based RL methods have shown impressive general results across benchmarks but come at the cost of increased complexity and slow run times, limiting their broader applicability. In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. To achieve this, we leverage model-based representations that approximately linearize the value function, taking advantage of the denser task objectives used by model-based RL while avoiding the costs associated with planning or simulated trajectories. We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with…

Peer Reviews

Decision·ICLR 2025 Spotlight

Reviewer 01Rating 6Confidence 3

Strengths

1. MR.Q combines various state-of-the-art techniques and demonstrates promising performance gains with a single set of hyperparameters across multiple benchmarks, showing both efficacy and resource efficiency. 2. The paper provides detailed ablation studies, offering insights that may benefit future research in this field.

Weaknesses

The theoretical motivation presented by the authors is somewhat confusing and challenging to interpret. Please refer to Question 3 for details.

Reviewer 02Rating 8Confidence 4

Strengths

* The paper is very well presented and has a clear structure. * The core approach - using world model representations to support policy learning - is sound and well-motivated in prior literature. * Strong efforts are made to make the model hyperparameter-insensitive, which is particularly valuable in RL where hyperparameter tuning is often ignored or exploited to give the appearance of improved performance. * The evaluation is extensive in the number of benchmarks and ablations. A consistent set

Weaknesses

## Hyperparameters One of the primary aims of the paper is hyperparameter robustness, with multiple components being designed to improve robustness and a single set of hyperparameters being used to evaluate MR.Q. However, aiming for robustness does not allow hyperparameters to be ignored in evaluation. In fact, *more* attention should be paid to hyperparameters, to demonstrate that MR.Q is less sensitive hyperparameters than baseline methods, or performs better when all algorithms are constraine

Reviewer 03Rating 8Confidence 4

Strengths

The authors propose an algorithm named, MR.Q that performs well across domains. They conduct well-selected ablations on MR.Q to highlight what effect their design decisions have on the algorithm. The empirical results are broad and seem sound. Generally, a useful contribution to the field.

Weaknesses

The authors argue that the benefits of model-based approaches may stem from their learned representations, rather than from their planning abilities. While this may be true to some degree, we assume that this is highly specific to the domains they conduct experiments on. In particular, Gym and DMC environments and most Atari games do not require much planning to be solved. However, we know that in other domains planning plays a critical role. Therefore, we believe it is important for the authors

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsSparse Evolutionary Training