Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning
Yuan Zhuang, Yuexin Bian, Sihong He, Jie Feng, Qing Su, Songyang Han, Jonathan Petit, Shihao Ji, Yuanyuan Shi, Fei Miao

TL;DR
This paper introduces Low-Rank Adaptation (LoRA) as a structural regularizer for critic learning in off-policy reinforcement learning, improving stability and performance by constraining updates to a low-dimensional subspace.
Contribution
The paper proposes using LoRA to regularize critic learning, which reduces overfitting and instability, leading to better performance across various RL algorithms and architectures.
Findings
LoRA reduces critic loss during training
LoRA improves overall policy performance
Achieves competitive results on multiple tasks
Abstract
Scaling critic capacity is a promising direction for improving off-policy reinforcement learning (RL). However, recent work shows that larger critics are prone to overfitting and instability in replay-based bootstrapped training. In this paper, we propose using Low-Rank Adaptation (LoRA) as a structural regularizer for critic learning. Our approach freezes randomly initialized base matrices and optimizes only the corresponding low-rank adapters, thereby constraining critic updates to a low-dimensional subspace. We evaluate our method across different off-policy RL algorithms, including SAC and FastTD3 based on different network architectures. Empirically, LoRA efficiently reduces critic loss during training and improves overall policy performance, achieving the best or competitive results on most tasks. Extensive experiments demonstrate that our low-rank updates provide a simple and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
