Latent Policy Steering through One-Step Flow Policies
Hokyun Im, Andrey Kolobov, Jianlong Fu, Youngwoon Lee

TL;DR
This paper introduces Latent Policy Steering (LPS), a novel offline RL method that improves latent policy optimization by backpropagating action-space Q-gradients through a differentiable policy, achieving state-of-the-art results with minimal tuning.
Contribution
LPS eliminates proxy critics in latent RL, enabling end-to-end optimization guided by original-action-space critics and a differentiable policy prior, enhancing robustness and performance.
Findings
LPS outperforms behavioral cloning on OGBench and robotic tasks.
LPS achieves state-of-the-art performance with minimal hyperparameter tuning.
LPS demonstrates robust offline RL without proxy latent critics.
Abstract
Offline reinforcement learning (RL) allows robots to learn from offline datasets without risky exploration. Yet, offline RL's performance often hinges on a brittle trade-off between (1) return maximization, which can push policies outside the dataset support, and (2) behavioral constraints, which typically require sensitive hyperparameter tuning. Latent steering offers a structural way to stay within the dataset support during RL, but existing offline adaptations commonly approximate action values using latent-space critics learned via indirect distillation, which can lose information and hinder convergence. We propose Latent Policy Steering (LPS), which enables high-fidelity latent policy improvement by backpropagating original-action-space Q-gradients through a differentiable one-step MeanFlow policy to update a latent-action-space actor. By eliminating proxy latent critics, LPS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
