Towards General-Purpose Model-Free Reinforcement Learning
Scott Fujimoto, Pierluca D'Oro, Amy Zhang, Yuandong Tian, Michael, Rabbat

TL;DR
This paper introduces MR.Q, a model-free deep reinforcement learning algorithm that uses approximate linearization of the value function to perform well across diverse benchmarks with a single hyperparameter set, advancing towards general-purpose RL.
Contribution
The paper proposes MR.Q, a novel model-free RL method leveraging model-based representations for broad applicability without increased complexity or tuning.
Findings
MR.Q performs competitively across multiple RL benchmarks.
It uses a single hyperparameter set for diverse tasks.
The approach bridges the gap between model-based and model-free RL.
Abstract
Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice however, RL algorithms are often tailored to specific benchmarks, relying on carefully tuned hyperparameters and algorithmic choices. Recently, powerful model-based RL methods have shown impressive general results across benchmarks but come at the cost of increased complexity and slow run times, limiting their broader applicability. In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. To achieve this, we leverage model-based representations that approximately linearize the value function, taking advantage of the denser task objectives used by model-based RL while avoiding the costs associated with planning or simulated trajectories. We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with…
Peer Reviews
Decision·ICLR 2025 Spotlight
1. MR.Q combines various state-of-the-art techniques and demonstrates promising performance gains with a single set of hyperparameters across multiple benchmarks, showing both efficacy and resource efficiency. 2. The paper provides detailed ablation studies, offering insights that may benefit future research in this field.
The theoretical motivation presented by the authors is somewhat confusing and challenging to interpret. Please refer to Question 3 for details.
* The paper is very well presented and has a clear structure. * The core approach - using world model representations to support policy learning - is sound and well-motivated in prior literature. * Strong efforts are made to make the model hyperparameter-insensitive, which is particularly valuable in RL where hyperparameter tuning is often ignored or exploited to give the appearance of improved performance. * The evaluation is extensive in the number of benchmarks and ablations. A consistent set
## Hyperparameters One of the primary aims of the paper is hyperparameter robustness, with multiple components being designed to improve robustness and a single set of hyperparameters being used to evaluate MR.Q. However, aiming for robustness does not allow hyperparameters to be ignored in evaluation. In fact, *more* attention should be paid to hyperparameters, to demonstrate that MR.Q is less sensitive hyperparameters than baseline methods, or performs better when all algorithms are constraine
The authors propose an algorithm named, MR.Q that performs well across domains. They conduct well-selected ablations on MR.Q to highlight what effect their design decisions have on the algorithm. The empirical results are broad and seem sound. Generally, a useful contribution to the field.
The authors argue that the benefits of model-based approaches may stem from their learned representations, rather than from their planning abilities. While this may be true to some degree, we assume that this is highly specific to the domains they conduct experiments on. In particular, Gym and DMC environments and most Atari games do not require much planning to be solved. However, we know that in other domains planning plays a critical role. Therefore, we believe it is important for the authors
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsSparse Evolutionary Training
