Clustering-Based Weight Orthogonalization for Stabilizing Deep Reinforcement Learning

Guoqing Ma; Yuhan Zhang; Yuming Dai; Guangfu Hao; Yang Chen; Shan Yu

arXiv:2511.11607·cs.LG·November 18, 2025

Clustering-Based Weight Orthogonalization for Stabilizing Deep Reinforcement Learning

Guoqing Ma, Yuhan Zhang, Yuming Dai, Guangfu Hao, Yang Chen, Shan Yu

PDF

Open Access

TL;DR

This paper introduces the COWM layer, a clustering-based weight orthogonalization technique that stabilizes deep reinforcement learning, improves sample efficiency, and outperforms existing methods across multiple benchmarks.

Contribution

The paper proposes the COWM layer, a novel method integrating clustering and projection to mitigate non-stationarity in RL, enhancing stability and learning speed.

Findings

01

COWM outperforms state-of-the-art methods by 9% and 12.6% on benchmarks.

02

COWM reduces gradient interference and stabilizes learning.

03

The approach is robust and general across various RL algorithms.

Abstract

Reinforcement learning (RL) has made significant advancements, achieving superhuman performance in various tasks. However, RL agents often operate under the assumption of environmental stationarity, which poses a great challenge to learning efficiency since many environments are inherently non-stationary. This non-stationarity results in the requirement of millions of iterations, leading to low sample efficiency. To address this issue, we introduce the Clustering Orthogonal Weight Modified (COWM) layer, which can be integrated into the policy network of any RL algorithm and mitigate non-stationarity effectively. The COWM layer stabilizes the learning process by employing clustering techniques and a projection matrix. Our approach not only improves learning speed but also reduces gradient interference, thereby enhancing the overall learning efficiency. Empirically, the COWM outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning