Latent Policy Steering through One-Step Flow Policies

Hokyun Im; Andrey Kolobov; Jianlong Fu; Youngwoon Lee

arXiv:2603.05296·cs.RO·March 6, 2026

Latent Policy Steering through One-Step Flow Policies

Hokyun Im, Andrey Kolobov, Jianlong Fu, Youngwoon Lee

PDF

Open Access

TL;DR

This paper introduces Latent Policy Steering (LPS), a novel offline RL method that improves latent policy optimization by backpropagating action-space Q-gradients through a differentiable policy, achieving state-of-the-art results with minimal tuning.

Contribution

LPS eliminates proxy critics in latent RL, enabling end-to-end optimization guided by original-action-space critics and a differentiable policy prior, enhancing robustness and performance.

Findings

01

LPS outperforms behavioral cloning on OGBench and robotic tasks.

02

LPS achieves state-of-the-art performance with minimal hyperparameter tuning.

03

LPS demonstrates robust offline RL without proxy latent critics.

Abstract

Offline reinforcement learning (RL) allows robots to learn from offline datasets without risky exploration. Yet, offline RL's performance often hinges on a brittle trade-off between (1) return maximization, which can push policies outside the dataset support, and (2) behavioral constraints, which typically require sensitive hyperparameter tuning. Latent steering offers a structural way to stay within the dataset support during RL, but existing offline adaptations commonly approximate action values using latent-space critics learned via indirect distillation, which can lose information and hinder convergence. We propose Latent Policy Steering (LPS), which enables high-fidelity latent policy improvement by backpropagating original-action-space Q-gradients through a differentiable one-step MeanFlow policy to update a latent-action-space actor. By eliminating proxy latent critics, LPS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis