OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation
Kuanning Wang, Ke Fan, Chenhao Qiu, Zeyu Shangguan, Yuqian Fu, Yanwei Fu, Daniel Seita, Xiangyang Xue

TL;DR
OFlow introduces a unified framework that combines temporal prediction and object-aware reasoning in a shared latent space, improving robustness in robotic manipulation tasks.
Contribution
It unifies future scene prediction and object reasoning in a shared semantic latent space, enhancing control robustness under distribution shifts.
Findings
Object-aware foresight improves robustness across multiple benchmarks.
Integrating OFlow into VLA pipelines enhances success in real-world tasks.
The method effectively filters task-irrelevant variations in scene understanding.
Abstract
Robust robotic manipulation requires not only predicting how the scene evolves over time, but also recognizing task-relevant objects in complex scenes. However, existing VLA models face two limitations. They typically act only on the current frame, while future prediction and object-aware reasoning are often learned in separate latent spaces. We propose OFlow (injecting Object-Aware Temporal Flow Matching into VLAs), a framework that addresses both limitations by unifying temporal foresight and object-aware reasoning in a shared semantic latent space. Our method forecasts future latents with temporal flow matching, factorizes them into object-aware representations that emphasize physically relevant cues while filtering task-irrelevant variation, and conditions continuous action generation on these predictions. By integrating OFlow into VLA pipelines, our method enables more reliable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
