Co-VisiON: Co-Visibility ReasONing on Sparse Image Sets of Indoor Scenes
Chao Chen, Nobel Dang, Juexiao Zhang, Wenkai Sun, Pengfei Zheng, Xuhang He, Yimeng Ye, Jiasheng Zhang, Taarun Srinivas, and Chen Feng

TL;DR
This paper introduces the Co-VisiON benchmark to evaluate co-visibility reasoning in sparse indoor scenes, revealing current models' limitations and proposing a new multi-view baseline inspired by human cognition.
Contribution
The paper presents a new benchmark dataset and evaluation for co-visibility reasoning, and introduces a novel multi-view baseline model that improves over existing vision-only approaches.
Findings
Current models struggle with sparse co-visibility reasoning.
A proprietary vision-language model outperforms vision-only baselines.
The proposed Covis model narrows the gap to human performance.
Abstract
Humans exhibit a remarkable ability to recognize co-visibility-the 3D regions simultaneously visible in multiple images-even when these images are sparsely distributed across a complex scene. This ability is foundational to 3D vision, robotic perception, and relies not only on low-level feature matching but also on high-level spatial reasoning and cognitive integration. Yet, it remains unclear whether current vision models can replicate this human-level proficiency. In this work, we introduce the Co-VisiON benchmark, designed to evaluate human-inspired co-visibility reasoning across more than 1,000 sparse-view indoor scenarios. Our results show that while co-visibility is often approached as a low-level feature-matching task, it remains challenging for existing vision models under sparse conditions. Notably, a proprietary vision-language model surpasses all vision-only baselines, but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Advanced Vision and Imaging
