Mean-Flow based One-Step Vision-Language-Action

Yang Chen; Xiaoguang Ma; Bin Zhao

arXiv:2603.01469·cs.RO·March 3, 2026

Mean-Flow based One-Step Vision-Language-Action

Yang Chen, Xiaoguang Ma, Bin Zhao

PDF

Open Access

TL;DR

This paper introduces a Mean-Flow based one-step approach for vision-language-action tasks that significantly reduces generation latency, enabling faster robotic manipulation without sacrificing performance.

Contribution

It proposes a novel Mean-Flow method that resolves noise issues in action generation, allowing one-step, high-efficiency VLA for robotic tasks.

Findings

01

Generation speed is 8.7 times faster than SmolVLA.

02

Generation speed is 83.9 times faster than Diffusion Policy.

03

Effective in real-world robotic experiments.

Abstract

Recent advances in FlowMatching-based Vision-Language-Action (VLA) frameworks have demonstrated remarkable advantages in generating high-frequency action chunks, particularly for highly dexterous robotic manipulation tasks. Despite these notable achievements, their practical applications are constrained by prolonged generation latency, which stems from inherent iterative sampling requirements and architectural limitations. To address this critical bottleneck, we propose a Mean-Flow based One-Step VLA approach. Specifically, we resolve the noise-induced issues in the action generation process, thereby eliminating the consistency constraints inherent to conventional Flow-Matching methods. This significantly enhances generation efficiency and enables one-step action generation. Real-world robotic experiments show that the generation speed of the proposed Mean-Flow based One-Step VLA is 8.7…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics