Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models
Yanyan Zhang, Chaoda Song, Vikash Singh, Xinpeng Li, Kai Ye, Zhe Hu, Zhongzhu Pu, Yu Yin, Vipin Chaudhary

TL;DR
This paper introduces a training-free, inference-time correction method for Vision-Language-Action models to handle non-stationary, dynamic scenarios more effectively, improving success rates significantly.
Contribution
It proposes Pace-and-Path Correction, a novel closed-form operator that enhances VLA models' temporal dynamics handling without retraining.
Findings
Outperforms state-of-the-art training-free wrappers and adaptive methods.
Improves success rates by up to 28.8% in dynamic environments.
Demonstrates effectiveness on the MoveBench diagnostic benchmark.
Abstract
Vision-Language-Action (VLA) models achieve remarkable flexibility and generalization beyond classical control paradigms. However, most prevailing VLAs are trained under a single-frame observation paradigm, which leaves them structurally blind to temporal dynamics. Consequently, these models degrade severely in non-stationary scenarios, even when trained or finetuned on dynamic datasets. Existing approaches either require expensive retraining or suffer from latency bottlenecks and poor temporal consistency across action chunks. We propose Pace-and-Path Correction, a training-free, closed-form inference-time operator that wraps any chunked-action VLA. From a single quadratic cost, joint minimization yields a unified solution that decomposes orthogonally into two distinct channels. The pace channel compresses execution along the planned direction, while the path channel applies an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
