VFP: Variational Flow-Matching Policy for Multi-Modal Robot Manipulation

Xuanran Zhai; Qianyou Zhao; Qiaojun Yu; Ce Hao

arXiv:2508.01622·cs.RO·October 3, 2025

VFP: Variational Flow-Matching Policy for Multi-Modal Robot Manipulation

Xuanran Zhai, Qianyou Zhao, Qiaojun Yu, Ce Hao

PDF

Open Access

TL;DR

VFP introduces a variational flow-matching policy with mode-aware action generation, leveraging optimal transport and mixture-of-experts to improve multi-modal robot manipulation in simulation and real-world tasks.

Contribution

The paper proposes VFP, a novel flow-matching policy that captures multi-modality using a variational latent prior, optimal transport, and a mixture-of-experts decoder, advancing multi-modal robot manipulation.

Findings

01

Achieves 49% improvement in task success rate over baselines in simulation.

02

Outperforms standard flow-based policies on real-robot tasks.

03

Maintains fast inference and compact model size.

Abstract

Flow-matching-based policies have recently emerged as a promising approach for learning-based robot manipulation, offering significant acceleration in action sampling compared to diffusion-based policies. However, conventional flow-matching methods struggle with multi-modality, often collapsing to averaged or ambiguous behaviors in complex manipulation tasks. To address this, we propose the Variational Flow-Matching Policy (VFP), which introduces a variational latent prior for mode-aware action generation and effectively captures both task-level and trajectory-level multi-modality. VFP further incorporates Kantorovich Optimal Transport (K-OT) for distribution-level alignment and utilizes a Mixture-of-Experts (MoE) decoder for mode specialization and efficient inference. We comprehensively evaluate VFP on 41 simulated tasks and 3 real-robot tasks, demonstrating its effectiveness and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Human Pose and Action Recognition