Hypo3D: Exploring Hypothetical Reasoning in 3D

Ye Mao; Weixun Luo; Junpeng Jing; Anlan Qiu; Krystian Mikolajczyk

arXiv:2502.00954·cs.CV·May 29, 2025

Hypo3D: Exploring Hypothetical Reasoning in 3D

Ye Mao, Weixun Luo, Junpeng Jing, Anlan Qiu, Krystian Mikolajczyk

PDF

Open Access 1 Video

TL;DR

Hypo3D introduces a benchmark for evaluating vision-language models' ability to perform hypothetical reasoning in 3D scenes without real-time data, highlighting current models' limitations in such reasoning tasks.

Contribution

This paper presents Hypo3D, the first benchmark for 3D hypothetical reasoning, and demonstrates the significant performance gap between state-of-the-art models and humans in this task.

Findings

01

State-of-the-art models perform poorly on Hypo3D tasks.

02

Models often fail to accurately reason about scene changes.

03

Humans outperform models significantly in hypothetical 3D reasoning.

Abstract

The rise of vision-language foundation models marks an advancement in bridging the gap between human and machine capabilities in 3D scene reasoning. Existing 3D reasoning benchmarks assume real-time scene accessibility, which is impractical due to the high cost of frequent scene updates. To this end, we introduce Hypothetical 3D Reasoning, namely Hypo3D, a benchmark designed to evaluate models' ability to reason without access to real-time scene data. Models need to imagine the scene state based on a provided change description before reasoning. Hypo3D is formulated as a 3D Visual Question Answering (VQA) benchmark, comprising 7,727 context changes across 700 indoor scenes, resulting in 14,885 question-answer pairs. An anchor-based world frame is established for all scenes, ensuring consistent reference to a global frame for directional terms in context changes and QAs. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Hypo3D: Exploring Hypothetical Reasoning in 3D· slideslive

Taxonomy

TopicsSemantic Web and Ontologies