Beyond Action Residuals: Real-World Robot Policy Steering via Bottleneck Latent Reinforcement Learning

Dongjie Yu; Kun Lei; Zhennan Jiang; Jia Pan; Huazhe Xu

arXiv:2605.19919·cs.RO·May 20, 2026

Beyond Action Residuals: Real-World Robot Policy Steering via Bottleneck Latent Reinforcement Learning

Dongjie Yu, Kun Lei, Zhennan Jiang, Jia Pan, Huazhe Xu

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces Z-Perturbation Reinforcement Learning (ZPRL), a novel method that improves robot policy adaptation by steering a pretrained policy through a compact latent space, enhancing sample efficiency and performance.

Contribution

ZPRL uses a variational information bottleneck to enable online RL fine-tuning via residuals in a latent space, leading to more effective and smoother policy adaptation.

Findings

01

ZPRL outperforms baselines in simulation and real-world tasks.

02

ZPRL improves success rates by 33.7% on average in real-world experiments.

03

ZPRL achieves better sample efficiency and smoother exploration behaviors.

Abstract

Pretrained imitation policies have become a strong foundation for robot manipulation, but they often require online improvement to overcome execution errors, limited dataset coverage, and deployment mismatch. A central question is therefore how reinforcement learning (RL) should adapt policies after offline pretraining. Existing lightweight methods commonly apply residual corrections directly in action space, but this often leads to noisy and poorly structured exploration. In this work, we propose Z-Perturbation Reinforcement Learning (ZPRL), an approach that steers pretrained policies through a compact bottleneck latent rather than through policy weights or output actions. During offline training, we augment the policy with a plug-and-play variational information bottleneck (VIB) module to extract a task-relevant latent interface from observation embeddings. During online finetuning,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://manutdmoon.github.io/ZPRL
github

Datasets

ManUtdMoon/robomimicv030
dataset· 59 dl
59 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.