GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning

Guoqing Ma; Siheng Wang; Zeyu Zhang; Shan Yu; Hao Tang

arXiv:2602.04315·cs.RO·February 5, 2026

GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning

Guoqing Ma, Siheng Wang, Zeyu Zhang, Shan Yu, Hao Tang

PDF

Open Access

TL;DR

GeneralVLA introduces a hierarchical vision-language-action model that leverages foundation models and knowledge-guided trajectory planning to enable zero-shot robotic manipulation without real-world data or human demonstrations.

Contribution

The paper presents a novel hierarchical VLA model that combines affordance perception, task understanding, and trajectory planning for scalable zero-shot robotic manipulation.

Findings

01

Successfully generates trajectories for 14 tasks.

02

Outperforms state-of-the-art methods like VoxPoser.

03

Produces more robust behavior cloning policies.

Abstract

Large foundation models have shown strong open-world generalization to complex problems in vision and language, but similar levels of generalization have yet to be achieved in robotics. One fundamental challenge is that the models exhibit limited zero-shot capability, which hampers their ability to generalize effectively to unseen scenarios. In this work, we propose GeneralVLA (Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning), a hierarchical vision-language-action (VLA) model that can be more effective in utilizing the generalization of foundation models, enabling zero-shot manipulation and automatically generating data for robotics. In particular, we study a class of hierarchical VLA model where the high-level ASM (Affordance Segmentation Module) is finetuned to perceive image keypoint affordances of the scene; the mid-level 3DAgent carries out…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Robotic Path Planning Algorithms