Co-VisiON: Co-Visibility ReasONing on Sparse Image Sets of Indoor Scenes

Chao Chen; Nobel Dang; Juexiao Zhang; Wenkai Sun; Pengfei Zheng; Xuhang He; Yimeng Ye; Jiasheng Zhang; Taarun Srinivas; and Chen Feng

arXiv:2506.16805·cs.CV·August 12, 2025

Co-VisiON: Co-Visibility ReasONing on Sparse Image Sets of Indoor Scenes

Chao Chen, Nobel Dang, Juexiao Zhang, Wenkai Sun, Pengfei Zheng, Xuhang He, Yimeng Ye, Jiasheng Zhang, Taarun Srinivas, and Chen Feng

PDF

Open Access

TL;DR

This paper introduces the Co-VisiON benchmark to evaluate co-visibility reasoning in sparse indoor scenes, revealing current models' limitations and proposing a new multi-view baseline inspired by human cognition.

Contribution

The paper presents a new benchmark dataset and evaluation for co-visibility reasoning, and introduces a novel multi-view baseline model that improves over existing vision-only approaches.

Findings

01

Current models struggle with sparse co-visibility reasoning.

02

A proprietary vision-language model outperforms vision-only baselines.

03

The proposed Covis model narrows the gap to human performance.

Abstract

Humans exhibit a remarkable ability to recognize co-visibility-the 3D regions simultaneously visible in multiple images-even when these images are sparsely distributed across a complex scene. This ability is foundational to 3D vision, robotic perception, and relies not only on low-level feature matching but also on high-level spatial reasoning and cognitive integration. Yet, it remains unclear whether current vision models can replicate this human-level proficiency. In this work, we introduce the Co-VisiON benchmark, designed to evaluate human-inspired co-visibility reasoning across more than 1,000 sparse-view indoor scenarios. Our results show that while co-visibility is often approached as a low-level feature-matching task, it remains challenging for existing vision models under sparse conditions. Notably, a proprietary vision-language model surpasses all vision-only baselines, but…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Advanced Vision and Imaging