SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
Hojoon Lee, Dongyoon Hwang, Donghu Kim, Hyunseung Kim, Jun Jet Tai, Kaushik Subramanian, Peter R. Wurman, Jaegul Choo, Peter Stone, Takuma Seno

TL;DR
SimBa introduces a simple architecture that scales up parameters in deep reinforcement learning by embedding a simplicity bias, leading to improved sample efficiency and state-of-the-art performance across various environments.
Contribution
The paper proposes SimBa, a novel architecture that effectively scales parameters in deep RL through simplicity bias, enhancing performance and efficiency.
Findings
Improves sample efficiency across multiple RL algorithms.
Matches or surpasses state-of-the-art performance.
Demonstrates broad applicability across environments.
Abstract
Recent advances in CV and NLP have been largely driven by scaling up the number of network parameters, despite traditional theories suggesting that larger networks are prone to overfitting. These large networks avoid overfitting by integrating components that induce a simplicity bias, guiding models toward simple and generalizable solutions. However, in deep RL, designing and scaling up networks have been less explored. Motivated by this opportunity, we present SimBa, an architecture designed to scale up parameters in deep RL by injecting a simplicity bias. SimBa consists of three components: (i) an observation normalization layer that standardizes inputs with running statistics, (ii) a residual feedforward block to provide a linear pathway from the input to output, and (iii) a layer normalization to control feature magnitudes. By scaling up parameters with SimBa, the sample efficiency…
Peer Reviews
Decision·ICLR 2025 Spotlight
1. **Innovative Application of Simplicity Bias**: The use of simplicity bias in SimBa to manage overparameterization effectively is a novel contribution to deep RL. 2. **Comprehensive Empirical Validation**: SimBa's performance is rigorously tested across multiple RL tasks, including DMC, MyoSuite, and HumanoidBench, showing consistent improvements in efficiency and scalability. 3. **Adaptability**: SimBa’s architecture is algorithm-agnostic and can integrate seamlessly with various RL algorit
1. Missing relation works about normalization. Researchers show that the RMS Norm works well in training foundation models. The proposed RSNorm is too similar to the RMS Norm, while the author has no discussion about the RMS Norm (even not in related work). Since this work investigates designing and scaling up networks in deep RL, the same tricks in designing and scaling up LLMs should be considered. 2. Unfair comparison between SAC+SimBa v.s. others. Since SAC+SimBa introduces some other layer
- Novel approach: SimBa addresses an important gap in deep RL research by exploring how to scale up network parameters while leveraging simplicity bias effectively. - Versatility: The architecture improves sample efficiency across various RL algorithms, including off-policy, on-policy, and unsupervised methods. - Performance: When applied to SAC, SimBa matches or surpasses state-of-the-art off-policy RL methods across a wide range of tasks. - Computational efficiency: SimBa achieves high perform
- While the evaluation covers 51 continuous control tasks, it might be beneficial to see SimBa's performance on a wider range of RL domains, particularly with images. - The paper doesn't discuss potential limitations or scenarios where SimBa might not be as effective
1. Clear and Well-Motivated Objective: The paper identifies simplicity bias as an underexplored factor in RL network scaling. 2. Reproducibility Efforts: A public codebase and descriptions of evaluation setups are provided. 3. Broad experimental setup.
1. Statistical significance is not apparent from the presented results. The standard error overlaps in some plots, e.g., Figures 1b, BRO, and SimBa on Figure 5b, and Figure 8 has no rages, but I guess it would be more challenging to add them since the analysis is in two dimensions, and also from Figure 14. The tables in the Appendix do not have standard deviations. Moreover, authors could use some t-test or Mann–Whitney U test to demonstrate statistical significance. 2. It would be valuable to s
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications
MethodsDilated Convolution · Average Pooling · 1x1 Convolution · Convolution · Layer Normalization · Global Average Pooling · Switchable Atrous Convolution
