DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching

Jiayi Chen; Wenxuan Song; Shuai Chen; Jingbo Wang; Zhijun Li; and Haoang Li

arXiv:2603.26320·cs.RO·April 8, 2026

DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching

Jiayi Chen, Wenxuan Song, Shuai Chen, Jingbo Wang, Zhijun Li, and Haoang Li

PDF

1 Repo

TL;DR

DFM-VLA introduces an iterative discrete flow matching approach for robotic manipulation, enabling dynamic action token refinement and outperforming existing decoding methods in accuracy and efficiency.

Contribution

It proposes a novel discrete flow matching framework for iterative action token refinement in vision-language-action models for robotics.

Findings

01

DFM-VLA outperforms autoregressive and diffusion baselines in manipulation tasks.

02

Achieves 95.7% success rate on LIBERO dataset.

03

Attains an average success length of 4.44 on CALVIN.

Abstract

Vision--Language--Action (VLA) models that encode actions using a discrete tokenization scheme are increasingly adopted for robotic manipulation, but existing decoding paradigms remain fundamentally limited. Whether actions are decoded sequentially by autoregressive VLAs or in parallel by discrete diffusion VLAs, once a token is generated, it is typically fixed and cannot be revised in subsequent iterations, so early token errors cannot be effectively corrected later. We propose DFM-VLA, a discrete flow matching VLA for iterative refinement of action tokens. DFM-VLA~models a token-level probability velocity field that dynamically updates the full action sequence across refinement iterations. We investigate two ways to construct the velocity field: an auxiliary velocity-head formulation and an action-embedding-guided formulation. Our framework further adopts a two-stage decoding strategy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://chris1220313648.github.io/DFM-VLA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.