TL;DR
This paper introduces Action Coherence Guidance (ACG), a training-free test-time method that enhances action coherence in flow-based vision-language-action models, leading to improved stability and success in robotic manipulation tasks.
Contribution
The paper presents a novel, training-free test-time guidance algorithm called ACG that improves action coherence in flow-based VLA models, addressing noise sensitivity issues.
Findings
ACG improves action coherence across multiple datasets.
ACG increases success rates in robotic manipulation tasks.
ACG enhances stability and reduces trajectory drift.
Abstract
Diffusion and flow matching models have emerged as powerful robot policies, enabling Vision-Language-Action (VLA) models to generalize across diverse scenes and instructions. Yet, when trained via imitation learning, their high generative capacity makes them sensitive to noise in human demonstrations: jerks, pauses, and jitter which reduce action coherence. Reduced action coherence causes instability and trajectory drift during deployment, failures that are catastrophic in fine-grained manipulation where precision is crucial. In this paper, we present Action Coherence Guidance (ACG) for VLA models, a training-free test-time guidance algorithm that improves action coherence and thereby yields performance gains. Evaluated on RoboCasa, DexMimicGen, and real-world SO-101 tasks, ACG consistently improves action coherence and boosts success rates across diverse manipulation tasks. Code and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
