Learning Native Continuation for Action Chunking Flow Policies

Yufeng Liu; Hang Yu; Juntu Zhao; Bocheng Li; Di Zhang; Mingzhu Li; Wenxuan Wu; Yingdong Hu; Junyuan Xie; Junliang Guo; Dequan Wang; Yang Gao

arXiv:2602.12978·cs.RO·May 19, 2026

Learning Native Continuation for Action Chunking Flow Policies

Yufeng Liu, Hang Yu, Juntu Zhao, Bocheng Li, Di Zhang, Mingzhu Li, Wenxuan Wu, Yingdong Hu, Junyuan Xie, Junliang Guo, Dequan Wang, Yang Gao

PDF

TL;DR

Legato is a novel training method for action chunking in vision-language action models that improves trajectory smoothness and task efficiency by aligning training and inference dynamics.

Contribution

It introduces a continuation-based training approach that enhances smoothness and robustness of flow policies in real-time action chunking.

Findings

01

Legato produces smoother trajectories during execution.

02

It reduces spurious multimodal switching and hesitation.

03

Achieves ~10% improvements in task completion time.

Abstract

Action chunking enables Vision Language Action (VLA) models to run in real time, but naive chunked execution often exhibits discontinuities at chunk boundaries. Real-Time Chunking (RTC) alleviates this issue but is external to the policy, leading to spurious multimodal switching and trajectories that are not intrinsically smooth. We propose Legato, a training-time continuation method for action-chunked flow-based VLA policies. Specifically, Legato initializes denoising from a schedule-shaped mixture of known actions and noise, exposing the model to partial action information. Moreover, Legato reshapes the learned flow dynamics to ensure that the denoising process remains consistent between training and inference under per-step guidance. Legato further uses randomized schedule condition during training to support varying inference delays and achieve controllable smoothness. Empirically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning