Beyond Action Residuals: Real-World Robot Policy Steering via Bottleneck Latent Reinforcement Learning
Dongjie Yu, Kun Lei, Zhennan Jiang, Jia Pan, Huazhe Xu

TL;DR
This paper introduces Z-Perturbation Reinforcement Learning (ZPRL), a novel method that improves robot policy adaptation by steering a pretrained policy through a compact latent space, enhancing sample efficiency and performance.
Contribution
ZPRL uses a variational information bottleneck to enable online RL fine-tuning via residuals in a latent space, leading to more effective and smoother policy adaptation.
Findings
ZPRL outperforms baselines in simulation and real-world tasks.
ZPRL improves success rates by 33.7% on average in real-world experiments.
ZPRL achieves better sample efficiency and smoother exploration behaviors.
Abstract
Pretrained imitation policies have become a strong foundation for robot manipulation, but they often require online improvement to overcome execution errors, limited dataset coverage, and deployment mismatch. A central question is therefore how reinforcement learning (RL) should adapt policies after offline pretraining. Existing lightweight methods commonly apply residual corrections directly in action space, but this often leads to noisy and poorly structured exploration. In this work, we propose Z-Perturbation Reinforcement Learning (ZPRL), an approach that steers pretrained policies through a compact bottleneck latent rather than through policy weights or output actions. During offline training, we augment the policy with a plug-and-play variational information bottleneck (VIB) module to extract a task-relevant latent interface from observation embeddings. During online finetuning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
