TL;DR
GESS introduces a multi-cue learning framework that combines semantic, geometric, and depth cues to improve local feature detection and description in computer vision tasks.
Contribution
It proposes a novel multi-cue guided framework with SDAK and UTCF modules, enhancing robustness and discriminability over existing single-cue methods.
Findings
Outperforms state-of-the-art on four benchmarks.
Improves keypoint stability and descriptor discriminability.
Source code available at provided GitHub link.
Abstract
Robust local feature detection and description are foundational tasks in computer vision. Existing methods primarily rely on single appearance cues for modeling, leading to unstable keypoints and insufficient descriptor discriminability. In this paper, we propose a multi-cue guided local feature learning framework that leverages semantic and geometric cues to synergistically enhance detection robustness and descriptor discriminability. Specifically, we construct a joint semantic-normal prediction head and a depth stability prediction head atop a lightweight backbone. The former leverages a shared 3D vector field to deeply couple semantic and normal cues, thereby resolving optimization interference from heterogeneous inconsistencies. The latter quantifies the reliability of local regions from a geometric consistency perspective, providing deterministic guidance for robust keypoint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
