Scaling CrossQ with Weight Normalization
Daniel Palenicek, Florian Vogt, Jan Peters

TL;DR
This paper enhances the CrossQ reinforcement learning algorithm by integrating weight normalization, enabling stable scaling to higher update-to-data ratios and improving sample efficiency on complex benchmarks.
Contribution
We introduce weight normalization into CrossQ, allowing it to scale effectively with higher UTD ratios and stabilize training without network resets.
Findings
Achieves stable training at higher UTD ratios.
Outperforms previous methods on DeepMind control benchmarks.
Prevents Q-bias explosion and weight growth issues.
Abstract
Reinforcement learning has achieved significant milestones, but sample efficiency remains a bottleneck for real-world applications. Recently, CrossQ has demonstrated state-of-the-art sample efficiency with a low update-to-data (UTD) ratio of 1. In this work, we explore CrossQ's scaling behavior with higher UTD ratios. We identify challenges in the training dynamics which are emphasized by higher UTDs, particularly Q-bias explosion and the growing magnitude of critic network weights. To address this, we integrate weight normalization into the CrossQ framework, a solution that stabilizes training, prevents potential loss of plasticity and keeps the effective learning rate constant. Our proposed approach reliably scales with increasing UTD ratios, achieving competitive or superior performance across a range of challenging tasks on the DeepMind control benchmark, notably the complex dog and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Neural Networks and Reservoir Computing
