Enabling Dynamic Tracking in Vision-Language-Action Models via Time-Discrete and Time-Continuous Velocity Feedforward
Johannes Hechtl, Philipp Schmitt, Georg von Wichert, Wolfram Burgard

TL;DR
This paper introduces methods to incorporate velocity feedforward terms into vision-language-action models for robot manipulation, improving responsiveness and safety in contact-rich tasks by bridging the gap between compliance and precision.
Contribution
It proposes two model-agnostic approaches—finite-difference and B-Spline—for integrating velocity targets into VLA policies, enhancing control performance without extensive architecture changes.
Findings
Finite difference approach improves task speed.
B-Spline method maintains high success rates.
Both methods are compatible with standard architectures.
Abstract
While vision-language-action (VLA) models have shown great promise for robot manipulation, their deployment on rigid industrial robots remains challenging due to the inherent trade-off between compliance and responsiveness. Standard Behavior Cloning (BC) approaches predict discrete poses at low frequencies, omitting the velocity and acceleration feedforward terms typically used by low-level compliant controllers. This requires to rely on high stiffness for accurate tracking, thereby sacrificing safe contact dynamics. In this paper, we demonstrate the importance of integrating velocity feedforward terms into VLA policies to resolve this trade-off. We propose two methods for extracting velocity targets from VLAs: a time-discrete finite-difference approximation that serves as a highly effective bridge for existing models, and a continuous Cubic B-Spline action space that natively yields…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Teleoperation and Haptic Systems · Advanced Vision and Imaging
