Scaling CrossQ with Weight Normalization

Daniel Palenicek; Florian Vogt; Jan Peters

arXiv:2506.03758·cs.LG·June 5, 2025

Scaling CrossQ with Weight Normalization

Daniel Palenicek, Florian Vogt, Jan Peters

PDF

Open Access

TL;DR

This paper enhances the CrossQ reinforcement learning algorithm by integrating weight normalization, enabling stable scaling to higher update-to-data ratios and improving sample efficiency on complex benchmarks.

Contribution

We introduce weight normalization into CrossQ, allowing it to scale effectively with higher UTD ratios and stabilize training without network resets.

Findings

01

Achieves stable training at higher UTD ratios.

02

Outperforms previous methods on DeepMind control benchmarks.

03

Prevents Q-bias explosion and weight growth issues.

Abstract

Reinforcement learning has achieved significant milestones, but sample efficiency remains a bottleneck for real-world applications. Recently, CrossQ has demonstrated state-of-the-art sample efficiency with a low update-to-data (UTD) ratio of 1. In this work, we explore CrossQ's scaling behavior with higher UTD ratios. We identify challenges in the training dynamics which are emphasized by higher UTDs, particularly Q-bias explosion and the growing magnitude of critic network weights. To address this, we integrate weight normalization into the CrossQ framework, a solution that stabilizes training, prevents potential loss of plasticity and keeps the effective learning rate constant. Our proposed approach reliably scales with increasing UTD ratios, achieving competitive or superior performance across a range of challenging tasks on the DeepMind control benchmark, notably the complex dog and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Neural Networks and Reservoir Computing