GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning
Guoqing Ma, Siheng Wang, Zeyu Zhang, Shan Yu, Hao Tang

TL;DR
GeneralVLA introduces a hierarchical vision-language-action model that leverages foundation models and knowledge-guided trajectory planning to enable zero-shot robotic manipulation without real-world data or human demonstrations.
Contribution
The paper presents a novel hierarchical VLA model that combines affordance perception, task understanding, and trajectory planning for scalable zero-shot robotic manipulation.
Findings
Successfully generates trajectories for 14 tasks.
Outperforms state-of-the-art methods like VoxPoser.
Produces more robust behavior cloning policies.
Abstract
Large foundation models have shown strong open-world generalization to complex problems in vision and language, but similar levels of generalization have yet to be achieved in robotics. One fundamental challenge is that the models exhibit limited zero-shot capability, which hampers their ability to generalize effectively to unseen scenarios. In this work, we propose GeneralVLA (Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning), a hierarchical vision-language-action (VLA) model that can be more effective in utilizing the generalization of foundation models, enabling zero-shot manipulation and automatically generating data for robotics. In particular, we study a class of hierarchical VLA model where the high-level ASM (Affordance Segmentation Module) is finetuned to perceive image keypoint affordances of the scene; the mid-level 3DAgent carries out…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Robotic Path Planning Algorithms
