Asynchronous Fast-Slow Vision-Language-Action Policies for Whole-Body Robotic Manipulation

Teqiang Zou; Hongliang Zeng; Yuxuan Nong; Yifan Li; Kehui Liu; Haotian Yang; Xinyang Ling; Xin Li; Lianyang Ma

arXiv:2512.20188·cs.RO·December 24, 2025

Asynchronous Fast-Slow Vision-Language-Action Policies for Whole-Body Robotic Manipulation

Teqiang Zou, Hongliang Zeng, Yuxuan Nong, Yifan Li, Kehui Liu, Haotian Yang, Xinyang Ling, Xin Li, Lianyang Ma

PDF

Open Access

TL;DR

This paper presents DuoCore-FS, an asynchronous vision-language-action framework for robotic manipulation that improves real-time control and task success by decoupling high-frequency action generation from slower semantic reasoning.

Contribution

The novel asynchronous architecture with a latent buffer and action tokenizer enables faster whole-body robot control while maintaining end-to-end training of the policy.

Findings

01

Achieves 30 Hz action generation with a 3B-parameter VLM.

02

Improves task success rates in real-world manipulation.

03

Enhances responsiveness over synchronous models.

Abstract

Most Vision-Language-Action (VLA) systems integrate a Vision-Language Model (VLM) for semantic reasoning with an action expert generating continuous action signals, yet both typically run at a single unified frequency. As a result, policy performance is constrained by the low inference speed of large VLMs. This mandatory synchronous execution severely limits control stability and real-time performance in whole-body robotic manipulation, which involves more joints, larger motion spaces, and dynamically changing views. We introduce a truly asynchronous Fast-Slow VLA framework (DuoCore-FS), organizing the system into a fast pathway for high-frequency action generation and a slow pathway for rich VLM reasoning. The system is characterized by two key features. First, a latent representation buffer bridges the slow and fast systems. It stores instruction semantics and action-reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Reinforcement Learning in Robotics