DAM-VLA: A Dynamic Action Model-Based Vision-Language-Action Framework for Robot Manipulation

Xiongfeng Peng; Jiaqian Yu; Dingzhe Li; Yixiang Jin; Lu Xu; Yamin Mao; Chao Zhang; Weiming Li; Sujin Jang; Dongwook Lee; Daehyun Ji

arXiv:2603.00926·cs.RO·March 3, 2026

DAM-VLA: A Dynamic Action Model-Based Vision-Language-Action Framework for Robot Manipulation

Xiongfeng Peng, Jiaqian Yu, Dingzhe Li, Yixiang Jin, Lu Xu, Yamin Mao, Chao Zhang, Weiming Li, Sujin Jang, Dongwook Lee, Daehyun Ji

PDF

Open Access

TL;DR

DAM-VLA introduces a dynamic, task-specific vision-language-action framework for robots that effectively combines high-level reasoning with precise manipulation, enabling better performance in complex, real-world tasks.

Contribution

It proposes a novel integration of VLM reasoning with diffusion-based action models, featuring an action routing and dual-scale weighting mechanism for improved robot manipulation.

Findings

01

Outperforms state-of-the-art VLA methods in success rates

02

Demonstrates robustness in long-horizon and contact-rich tasks

03

Achieves high generalization from simple to complex tasks

Abstract

In dynamic environments such as warehouses, hospitals, and homes, robots must seamlessly transition between gross motion and precise manipulations to complete complex tasks. However, current Vision-Language-Action (VLA) frameworks, largely adapted from pre-trained Vision-Language Models (VLMs), often struggle to reconcile general task adaptability with the specialized precision required for intricate manipulation. To address this challenge, we propose DAM-VLA, a dynamic action model-based VLA framework. DAM-VLA integrates VLM reasoning with diffusion-based action models specialized for arm and gripper control. Specifically, it introduces (i) an action routing mechanism, using task-specific visual and linguistic cues to select appropriate action models (e.g., arm movement or gripper manipulation), (ii) a dynamic action model that fuses high-level VLM cognition with low-level visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI