Space3D-Bench: Spatial 3D Question Answering Benchmark

Emilia Szymanska; Mihai Dusmanu; Jan-Willem Buurlage; Mahdi Rad; Marc; Pollefeys

arXiv:2408.16662·cs.CV·September 17, 2024

Space3D-Bench: Spatial 3D Question Answering Benchmark

Emilia Szymanska, Mihai Dusmanu, Jan-Willem Buurlage, Mahdi Rad, Marc, Pollefeys

PDF

Open Access

TL;DR

Space3D-Bench introduces a comprehensive 3D question-answering dataset with diverse modalities and an assessment system, advancing the evaluation of models' understanding of spatial environments.

Contribution

It provides a large, balanced dataset of 3D spatial questions across multiple modalities and a novel assessment system using vision-language models.

Findings

01

Baseline RAG3D-Chat achieves 67% accuracy.

02

Dataset covers a wide range of 3D spatial reasoning tasks.

03

Assessment system effectively grades natural language responses.

Abstract

Answering questions about the spatial properties of the environment poses challenges for existing language and vision foundation models due to a lack of understanding of the 3D world notably in terms of relationships between objects. To push the field forward, multiple 3D Q&A datasets were proposed which, overall, provide a variety of questions, but they individually focus on particular aspects of 3D reasoning or are limited in terms of data modalities. To address this, we present Space3D-Bench - a collection of 1000 general spatial questions and answers related to scenes of the Replica dataset which offers a variety of data modalities: point clouds, posed RGB-D images, navigation meshes and 3D object detections. To ensure that the questions cover a wide range of 3D objectives, we propose an indoor spatial questions taxonomy inspired by geographic information systems and use it to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Neural Network Applications

MethodsFocus