OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation

Kuanning Wang; Ke Fan; Chenhao Qiu; Zeyu Shangguan; Yuqian Fu; Yanwei Fu; Daniel Seita; Xiangyang Xue

arXiv:2604.17876·cs.RO·April 21, 2026

OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation

Kuanning Wang, Ke Fan, Chenhao Qiu, Zeyu Shangguan, Yuqian Fu, Yanwei Fu, Daniel Seita, Xiangyang Xue

PDF

TL;DR

OFlow introduces a unified framework that combines temporal prediction and object-aware reasoning in a shared latent space, improving robustness in robotic manipulation tasks.

Contribution

It unifies future scene prediction and object reasoning in a shared semantic latent space, enhancing control robustness under distribution shifts.

Findings

01

Object-aware foresight improves robustness across multiple benchmarks.

02

Integrating OFlow into VLA pipelines enhances success in real-world tasks.

03

The method effectively filters task-irrelevant variations in scene understanding.

Abstract

Robust robotic manipulation requires not only predicting how the scene evolves over time, but also recognizing task-relevant objects in complex scenes. However, existing VLA models face two limitations. They typically act only on the current frame, while future prediction and object-aware reasoning are often learned in separate latent spaces. We propose OFlow (injecting Object-Aware Temporal Flow Matching into VLAs), a framework that addresses both limitations by unifying temporal foresight and object-aware reasoning in a shared semantic latent space. Our method forecasts future latents with temporal flow matching, factorizes them into object-aware representations that emphasize physically relevant cues while filtering task-irrelevant variation, and conditions continuous action generation on these predictions. By integrating OFlow into VLA pipelines, our method enables more reliable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.