Translating Flow to Policy via Hindsight Online Imitation

Yitian Zheng; Zhangchen Ye; Weijun Dong; Shengjie Wang; Yuyang Liu; Chongjie Zhang; Chuan Wen; Yang Gao

arXiv:2512.19269·cs.RO·February 13, 2026

Translating Flow to Policy via Hindsight Online Imitation

Yitian Zheng, Zhangchen Ye, Weijun Dong, Shengjie Wang, Yuyang Liu, Chongjie Zhang, Chuan Wen, Yang Gao

PDF

Open Access 3 Reviews

TL;DR

HinFlow enhances low-level robot policies by online relabeling high-level goals from achieved outcomes, enabling scalable learning from diverse data sources and significantly improving manipulation performance.

Contribution

The paper introduces HinFlow, a method that online relabels high-level goals from robot interactions to improve policy learning, enabling transfer from video data and outperforming existing methods.

Findings

01

Achieves over 2x performance improvement in manipulation tasks.

02

Enables policy learning from cross-embodiment video data.

03

Significantly outperforms existing methods in diverse tasks.

Abstract

Recent advances in hierarchical robot systems leverage a high-level planner to propose task plans and a low-level policy to generate robot actions. This design allows training the planner on action-free or even non-robot data sources (e.g., videos), providing transferable high-level guidance. Nevertheless, grounding these high-level plans into executable actions remains challenging, especially with the limited availability of high-quality robot data. To this end, we propose to improve the low-level policy through online interactions. Specifically, our approach collects online rollouts, retrospectively annotates the corresponding high-level goals from achieved outcomes, and aggregates these hindsight-relabeled experiences to update a goal-conditioned imitation policy. Our method, Hindsight Flow-conditioned Online Imitation (HinFlow), instantiates this idea with 2D point flows as the…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

The key idea is hindsight relabeling of achieved flows for dense supervision during online practice is well used data augmentation technique that enable sample-efficient policy training. The experiments highlight robustness and versatility for zero-shot generalization to novel objects/distractors when the planner covers those visuals. The paper motivates flows as a compact, appearance-robust high-level representation and positions HinFlow as a simple bridge that grounds those plans into executa

Weaknesses

The proposed approach of hindsight relabeling depends on the flow quality and point tracking/segmentation. Since the results are in sim where this feasible, it is unclear if the approach is robust to sim2real gaps and reliably work in real world. The point tracking is challenging also due to 2D ambiguity for 3D motions, occlusions and high-speed motions. Policy trains on achieved flows but infers under predicted flows and a potential distribution shift has not been analyzed.

Reviewer 02Rating 4Confidence 3

Strengths

- The proposed method effectively mitigates the data scarcity problem of grounding point flow to environmental actions via hindsight relabeling. - The extensive evaluations show that the framework can not only iteratively refine with relabeled data compared to the baselines, but also release the full potential of flow representation in task and embodiment generalization. - Single-task policy can converge to high performance with a small number of environment interactions, highlighting the sample

Weaknesses

- Both point flow representations and hindsight relabeling have been widely investigated by prior works for generalizability and resolving data scarcity issues, respectively. For video/flow-based methods, more baselines should be considered, like UniPi [1] and FlowDiffusion [2]. Also, it would be helpful to explain why in the Place Sphere task, the performance has a decreasing-and-increasing trend. - The success of the proposed method heavily depends on the flow quality predicted by the high-lev

Reviewer 03Rating 8Confidence 4

Strengths

- The paper tackles a timely and important challenge in hierarchical robot learning: grounding high-level, video-derived plans into reliable low-level control. - The proposed hindsight flow-conditioning idea is simple, intuitive, and well-motivated, yet leads to strong performance gains with good sample efficiency, achieving large improvements within 80K online interaction steps. - The experiments span a diverse set of seven manipulation tasks across two widely used benchmarks (LIBERO and Mani

Weaknesses

While the reviewer is positive about the contributions and overall quality of this work, some clarifications and additions would further improve its clarity and impact: - **Baseline comparison with non-hierarchical online learning.** The current baselines emphasize either offline imitation (BC) or hierarchical approaches. Including a pure online RL baseline, such as PPO — even if it is expected to underperform within the 80K interaction budget — would help isolate the benefit of the flow-based

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Social Robot Interaction and HRI