Hyperspherical Normalization for Scalable Deep Reinforcement Learning

Hojoon Lee; Youngdo Lee; Takuma Seno; Donghu Kim; Peter Stone; Jaegul Choo

arXiv:2502.15280·cs.LG·May 30, 2025

Hyperspherical Normalization for Scalable Deep Reinforcement Learning

Hojoon Lee, Youngdo Lee, Takuma Seno, Donghu Kim, Peter Stone, Jaegul Choo

PDF

Open Access 1 Datasets

TL;DR

This paper introduces SimbaV2, a novel reinforcement learning architecture that stabilizes training with hyperspherical normalization and reward scaling, enabling scalable, high-performance learning on complex continuous control tasks.

Contribution

The paper proposes SimbaV2, a new RL architecture that stabilizes training for large models using hyperspherical normalization and distributional value estimation.

Findings

01

Achieves state-of-the-art results on 57 continuous control tasks

02

Effectively scales with larger models and more compute

03

Stabilizes optimization in RL with novel normalization techniques

Abstract

Scaling up the model size and computation has brought consistent performance improvements in supervised learning. However, this lesson often fails to apply to reinforcement learning (RL) because training the model on non-stationary data easily leads to overfitting and unstable optimization. In response, we introduce SimbaV2, a novel RL architecture designed to stabilize optimization by (i) constraining the growth of weight and feature norm by hyperspherical normalization; and (ii) using a distributional value estimation with reward scaling to maintain stable gradients under varying reward magnitudes. Using the soft actor-critic as a base algorithm, SimbaV2 scales up effectively with larger models and greater compute, achieving state-of-the-art performance on 57 continuous control tasks across 4 domains. The code is available at https://dojeon-ai.github.io/SimbaV2.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

joonleesky/simbaV2
dataset· 90 dl
90 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning

MethodsBalanced Selection