Communicating about Space: Language-Mediated Spatial Integration Across Partial Views

Ankur Sikarwar; Debangan Mishra; Sudarshan Nikhil; Ponnurangam Kumaraguru; Aishwarya Agrawal

arXiv:2603.27183·cs.CV·April 2, 2026

Communicating about Space: Language-Mediated Spatial Integration Across Partial Views

Ankur Sikarwar, Debangan Mishra, Sudarshan Nikhil, Ponnurangam Kumaraguru, Aishwarya Agrawal

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces COSMIC, a benchmark for evaluating multimodal language models' ability to develop shared spatial understanding through dialogue in 3D environments, revealing current limitations and room for improvement.

Contribution

The paper presents COSMIC, a new benchmark for collaborative spatial communication, and systematically evaluates MLLMs' capabilities in forming shared spatial mental models.

Findings

01

MLLMs reliably identify shared anchor objects across views.

02

MLLMs perform poorly on relational reasoning and global map building.

03

Human dialogues achieve 95% accuracy, while best models reach 72%, indicating significant gap.

Abstract

Humans build shared spatial understanding by communicating partial, viewpoint-dependent observations. We ask whether Multimodal Large Language Models (MLLMs) can do the same, aligning distinct egocentric views through dialogue to form a coherent, allocentric mental model of a shared environment. To study this systematically, we introduce COSMIC, a benchmark for Collaborative Spatial Communication. In this setting, two static MLLM agents observe a 3D indoor environment from different viewpoints and exchange natural-language messages to solve spatial queries. COSMIC contains 899 diverse scenes and 1250 question-answer pairs spanning five tasks. We find a capability hierarchy, MLLMs are most reliable at identifying shared anchor objects across views, perform worse on relational reasoning, and largely fail at building globally consistent maps, performing near chance, even for frontier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ankursikarwar/Cosmic
github

Datasets

mair-lab/Cosmic
dataset· 595 dl
595 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.