GoalVLM: VLM-driven Object Goal Navigation for Multi-Agent System

MoniJesu James; Amir Atef Habel; Aleksey Fedoseev; and Dzmitry Tsetserokou

arXiv:2603.18210·cs.RO·March 20, 2026

GoalVLM: VLM-driven Object Goal Navigation for Multi-Agent System

MoniJesu James, Amir Atef Habel, Aleksey Fedoseev, and Dzmitry Tsetserokou

PDF

Open Access

TL;DR

GoalVLM introduces a multi-agent framework that leverages vision-language models and spatial reasoning for zero-shot, open-vocabulary object navigation, outperforming prior methods without task-specific training.

Contribution

It integrates VLMs with spatial and semantic reasoning for multi-agent object navigation, enabling zero-shot generalization to novel goals.

Findings

01

Achieves 55.8% subtask success rate on GOAT-Bench

02

Outperforms state-of-the-art methods without training

03

Validates importance of VLM-guided reasoning and localization

Abstract

Object-goal navigation has traditionally been limited to ground robots with closed-set object vocabularies. Existing multi-agent approaches depend on precomputed probabilistic graphs tied to fixed category sets, precluding generalization to novel goals at test time. We present GoalVLM, a cooperative multi-agent framework for zero-shot, open-vocabulary object navigation. GoalVLM integrates a Vision-Language Model (VLM) directly into the decision loop, SAM3 for text-prompted detection and segmentation, and SpaceOM for spatial reasoning, enabling agents to interpret free-form language goals and score frontiers via zero-shot semantic priors without retraining. Each agent builds a BEV semantic map from depth-projected voxel splatting, while a Goal Projector back-projects detections through calibrated depth into the map for reliable goal localization. A constraint-guided reasoning layer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning