DAM-VLA: A Dynamic Action Model-Based Vision-Language-Action Framework for Robot Manipulation
Xiongfeng Peng, Jiaqian Yu, Dingzhe Li, Yixiang Jin, Lu Xu, Yamin Mao, Chao Zhang, Weiming Li, Sujin Jang, Dongwook Lee, Daehyun Ji

TL;DR
DAM-VLA introduces a dynamic, task-specific vision-language-action framework for robots that effectively combines high-level reasoning with precise manipulation, enabling better performance in complex, real-world tasks.
Contribution
It proposes a novel integration of VLM reasoning with diffusion-based action models, featuring an action routing and dual-scale weighting mechanism for improved robot manipulation.
Findings
Outperforms state-of-the-art VLA methods in success rates
Demonstrates robustness in long-horizon and contact-rich tasks
Achieves high generalization from simple to complex tasks
Abstract
In dynamic environments such as warehouses, hospitals, and homes, robots must seamlessly transition between gross motion and precise manipulations to complete complex tasks. However, current Vision-Language-Action (VLA) frameworks, largely adapted from pre-trained Vision-Language Models (VLMs), often struggle to reconcile general task adaptability with the specialized precision required for intricate manipulation. To address this challenge, we propose DAM-VLA, a dynamic action model-based VLA framework. DAM-VLA integrates VLM reasoning with diffusion-based action models specialized for arm and gripper control. Specifically, it introduces (i) an action routing mechanism, using task-specific visual and linguistic cues to select appropriate action models (e.g., arm movement or gripper manipulation), (ii) a dynamic action model that fuses high-level VLM cognition with low-level visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI
