TL;DR
SceneGlue introduces a scene-aware transformer framework for feature matching that leverages implicit and explicit scene information, trained without scene-level annotations, to improve accuracy and robustness in cross-view correspondence tasks.
Contribution
It proposes a novel hybridizable matching paradigm with a Visibility Transformer, enabling scene awareness without scene-level groundtruth annotations.
Findings
Outperforms traditional methods in homography and pose estimation.
Enhances robustness and interpretability in feature matching.
Source code is publicly available at the provided GitHub link.
Abstract
Local feature matching plays a critical role in understanding the correspondence between cross-view images. However, traditional methods are constrained by the inherent local nature of feature descriptors, limiting their ability to capture non-local scene information that is essential for accurate cross-view correspondence. In this paper, we introduce SceneGlue, a scene-aware feature matching framework designed to overcome these limitations. SceneGlue leverages a hybridizable matching paradigm that integrates implicit parallel attention and explicit cross-view visibility estimation. The parallel attention mechanism simultaneously exchanges information among local descriptors within and across images, enhancing the scene's global context. To further enrich the scene awareness, we propose the Visibility Transformer, which explicitly categorizes features into visible and invisible regions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
