Asynchronous Fast-Slow Vision-Language-Action Policies for Whole-Body Robotic Manipulation
Teqiang Zou, Hongliang Zeng, Yuxuan Nong, Yifan Li, Kehui Liu, Haotian Yang, Xinyang Ling, Xin Li, Lianyang Ma

TL;DR
This paper presents DuoCore-FS, an asynchronous vision-language-action framework for robotic manipulation that improves real-time control and task success by decoupling high-frequency action generation from slower semantic reasoning.
Contribution
The novel asynchronous architecture with a latent buffer and action tokenizer enables faster whole-body robot control while maintaining end-to-end training of the policy.
Findings
Achieves 30 Hz action generation with a 3B-parameter VLM.
Improves task success rates in real-world manipulation.
Enhances responsiveness over synchronous models.
Abstract
Most Vision-Language-Action (VLA) systems integrate a Vision-Language Model (VLM) for semantic reasoning with an action expert generating continuous action signals, yet both typically run at a single unified frequency. As a result, policy performance is constrained by the low inference speed of large VLMs. This mandatory synchronous execution severely limits control stability and real-time performance in whole-body robotic manipulation, which involves more joints, larger motion spaces, and dynamically changing views. We introduce a truly asynchronous Fast-Slow VLA framework (DuoCore-FS), organizing the system into a fast pathway for high-frequency action generation and a slow pathway for rich VLM reasoning. The system is characterized by two key features. First, a latent representation buffer bridges the slow and fast systems. It stores instruction semantics and action-reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Reinforcement Learning in Robotics
