Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning

Yuan Zhuang; Yuexin Bian; Sihong He; Jie Feng; Qing Su; Songyang Han; Jonathan Petit; Shihao Ji; Yuanyuan Shi; Fei Miao

arXiv:2604.18978·cs.LG·May 8, 2026

Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning

Yuan Zhuang, Yuexin Bian, Sihong He, Jie Feng, Qing Su, Songyang Han, Jonathan Petit, Shihao Ji, Yuanyuan Shi, Fei Miao

PDF

TL;DR

This paper introduces Low-Rank Adaptation (LoRA) as a structural regularizer for critic learning in off-policy reinforcement learning, improving stability and performance by constraining updates to a low-dimensional subspace.

Contribution

The paper proposes using LoRA to regularize critic learning, which reduces overfitting and instability, leading to better performance across various RL algorithms and architectures.

Findings

01

LoRA reduces critic loss during training

02

LoRA improves overall policy performance

03

Achieves competitive results on multiple tasks

Abstract

Scaling critic capacity is a promising direction for improving off-policy reinforcement learning (RL). However, recent work shows that larger critics are prone to overfitting and instability in replay-based bootstrapped training. In this paper, we propose using Low-Rank Adaptation (LoRA) as a structural regularizer for critic learning. Our approach freezes randomly initialized base matrices and optimizes only the corresponding low-rank adapters, thereby constraining critic updates to a low-dimensional subspace. We evaluate our method across different off-policy RL algorithms, including SAC and FastTD3 based on different network architectures. Empirically, LoRA efficiently reduces critic loss during training and improves overall policy performance, achieving the best or competitive results on most tasks. Extensive experiments demonstrate that our low-rank updates provide a simple and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.