SceneGlue: Scene-Aware Transformer for Feature Matching without Scene-Level Annotation

Songlin Du; Xiaoyong Lu; Yaping Yan; Guobao Xiao; Xiaobo Lu; Takeshi Ikenaga

arXiv:2604.13941·cs.CV·April 16, 2026

SceneGlue: Scene-Aware Transformer for Feature Matching without Scene-Level Annotation

Songlin Du, Xiaoyong Lu, Yaping Yan, Guobao Xiao, Xiaobo Lu, Takeshi Ikenaga

PDF

1 Repo

TL;DR

SceneGlue introduces a scene-aware transformer framework for feature matching that leverages implicit and explicit scene information, trained without scene-level annotations, to improve accuracy and robustness in cross-view correspondence tasks.

Contribution

It proposes a novel hybridizable matching paradigm with a Visibility Transformer, enabling scene awareness without scene-level groundtruth annotations.

Findings

01

Outperforms traditional methods in homography and pose estimation.

02

Enhances robustness and interpretability in feature matching.

03

Source code is publicly available at the provided GitHub link.

Abstract

Local feature matching plays a critical role in understanding the correspondence between cross-view images. However, traditional methods are constrained by the inherent local nature of feature descriptors, limiting their ability to capture non-local scene information that is essential for accurate cross-view correspondence. In this paper, we introduce SceneGlue, a scene-aware feature matching framework designed to overcome these limitations. SceneGlue leverages a hybridizable matching paradigm that integrates implicit parallel attention and explicit cross-view visibility estimation. The parallel attention mechanism simultaneously exchanges information among local descriptors within and across images, enhancing the scene's global context. To further enrich the scene awareness, we propose the Visibility Transformer, which explicitly categorizes features into visible and invisible regions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

songlin-du/SceneGlue
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.