Structured Spatial Reasoning with Open Vocabulary Object Detectors
Negar Nejatishahidin, Madhukar Reddy Vongala, Jana Kosecka

TL;DR
This paper presents a structured probabilistic method combining 3D geometric features with open-vocabulary object detectors to improve spatial reasoning in robotic perception, outperforming existing Vision and Language Models by over 20%.
Contribution
It introduces a novel probabilistic approach that integrates geometric features with open-vocabulary detectors for enhanced spatial reasoning in robots.
Findings
Our method outperforms state-of-the-art VLMs by over 20% in spatial relation grounding.
The approach effectively combines 3D geometry with open-vocabulary detection for robotic perception.
Experimental validation on real-world and synthetic datasets demonstrates improved spatial reasoning accuracy.
Abstract
Reasoning about spatial relationships between objects is essential for many real-world robotic tasks, such as fetch-and-delivery, object rearrangement, and object search. The ability to detect and disambiguate different objects and identify their location is key to successful completion of these tasks. Several recent works have used powerful Vision and Language Models (VLMs) to unlock this capability in robotic agents. In this paper we introduce a structured probabilistic approach that integrates rich 3D geometric features with state-of-the-art open-vocabulary object detectors to enhance spatial reasoning for robotic perception. The approach is evaluated and compared against zero-shot performance of the state-of-the-art Vision and Language Models (VLMs) on spatial reasoning tasks. To enable this comparison, we annotate spatial clauses in real-world RGB-D Active Vision Dataset [1] and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Semantic Web and Ontologies · Geographic Information Systems Studies
