ACG: Action Coherence Guidance for Flow-based Vision-Language-Action models

Minho Park; Kinam Kim; Junha Hyung; Hyojin Jang; Hoiyeong Jin; Jooyeol Yun; Hojoon Lee; Jaegul Choo

arXiv:2510.22201·cs.RO·March 26, 2026

ACG: Action Coherence Guidance for Flow-based Vision-Language-Action models

Minho Park, Kinam Kim, Junha Hyung, Hyojin Jang, Hoiyeong Jin, Jooyeol Yun, Hojoon Lee, Jaegul Choo

PDF

2 Models

TL;DR

This paper introduces Action Coherence Guidance (ACG), a training-free test-time method that enhances action coherence in flow-based vision-language-action models, leading to improved stability and success in robotic manipulation tasks.

Contribution

The paper presents a novel, training-free test-time guidance algorithm called ACG that improves action coherence in flow-based VLA models, addressing noise sensitivity issues.

Findings

01

ACG improves action coherence across multiple datasets.

02

ACG increases success rates in robotic manipulation tasks.

03

ACG enhances stability and reduces trajectory drift.

Abstract

Diffusion and flow matching models have emerged as powerful robot policies, enabling Vision-Language-Action (VLA) models to generalize across diverse scenes and instructions. Yet, when trained via imitation learning, their high generative capacity makes them sensitive to noise in human demonstrations: jerks, pauses, and jitter which reduce action coherence. Reduced action coherence causes instability and trajectory drift during deployment, failures that are catastrophic in fine-grained manipulation where precision is crucial. In this paper, we present Action Coherence Guidance (ACG) for VLA models, a training-free test-time guidance algorithm that improves action coherence and thereby yields performance gains. Evaluated on RoboCasa, DexMimicGen, and real-world SO-101 tasks, ACG consistently improves action coherence and boosts success rates across diverse manipulation tasks. Code and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.