VLA-RAIL: A Real-Time Asynchronous Inference Linker for VLA Models and Robots
Yongsheng Zhao, Lei Zhao, Baoping Cheng, Gongxin Yao, Xuanzhang Wen, Han Gao

TL;DR
VLA-RAIL is a framework that enables real-time, smooth, and high-speed robotic actions by asynchronously fusing vision-language-action model outputs, reducing jitter and improving task success.
Contribution
The paper introduces VLA-RAIL, a novel asynchronous inference framework with trajectory smoothing and chunk fusion for improved robotic VLA model performance.
Findings
Reduces motion jitter in robotic actions
Increases execution speed of robot tasks
Improves success rates in manipulation tasks
Abstract
Vision-Language-Action (VLA) models have achieved remarkable breakthroughs in robotics, with the action chunk playing a dominant role in these advances. Given the real-time and continuous nature of robotic motion control, the strategies for fusing a queue of successive action chunks have a profound impact on the overall performance of VLA models. Existing methods suffer from jitter, stalling, or even pauses in robotic action execution, which not only limits the achievable execution speed but also reduces the overall success rate of task completion. This paper introduces VLA-RAIL (A Real-Time Asynchronous Inference Linker), a novel framework designed to address these issues by conducting model inference and robot motion control asynchronously and guaranteeing smooth, continuous, and high-speed action execution. The core contributions of the paper are two fold: a Trajectory Smoother that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Autonomous Vehicle Technology and Safety
