MIRAGE: A Multi-modal Benchmark for Spatial Perception, Reasoning, and Intelligence
Chonghan Liu, Haoran Wang, Felix Henry, Pu Miao, Yajie Zhang, Yu Zhao, Peiran Wu

TL;DR
MIRAGE is a comprehensive multi-modal benchmark designed to evaluate and advance models' abilities in spatial perception, relational reasoning, and object attribute recognition, addressing key gaps in current computer vision models.
Contribution
The paper introduces MIRAGE, a novel benchmark that assesses models on counting, relation, and combined tasks, highlighting limitations of current models in spatial reasoning.
Findings
Current models struggle with spatial relational reasoning.
MIRAGE reveals significant gaps in object attribute recognition.
Benchmark promotes development of improved spatial reasoning models.
Abstract
Spatial perception and reasoning are core components of human cognition, encompassing object recognition, spatial relational understanding, and dynamic reasoning. Despite progress in computer vision, existing benchmarks reveal significant gaps in models' abilities to accurately recognize object attributes and reason about spatial relationships, both essential for dynamic reasoning. To address these limitations, we propose MIRAGE, a multi-modal benchmark designed to evaluate models' capabilities in Counting (object attribute recognition), Relation (spatial relational reasoning), and Counting with Relation. Through diverse and complex scenarios requiring fine-grained recognition and reasoning, MIRAGE highlights critical limitations in state-of-the-art models, underscoring the need for improved representations and reasoning frameworks. By targeting these foundational abilities, MIRAGE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpatial Cognition and Navigation · Multimodal Machine Learning Applications · Constraint Satisfaction and Optimization
