ContextDesc: Local Descriptor Augmentation with Cross-Modality Context
Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li,, Tian Fang, Long Quan

TL;DR
This paper introduces ContextDesc, a novel framework that enhances local feature descriptors with cross-modality contextual information, improving geometric matching performance across diverse scenes.
Contribution
It proposes a unified learning framework that combines visual and geometric context for local descriptors and introduces an N-pair loss for better convergence.
Findings
Significant improvement on large-scale benchmarks
Lightweight augmentation scheme
Enhanced generalization in geometric matching
Abstract
Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations. In this paper, we go beyond the local detail representation by introducing context awareness to augment off-the-shelf local feature descriptors. Specifically, we propose a unified learning framework that leverages and aggregates the cross-modality contextual information, including (i) visual context from high-level image representation, and (ii) geometric context from 2D keypoint distribution. Moreover, we propose an effective N-pair loss that eschews the empirical hyper-parameter search and improves the convergence. The proposed augmentation scheme is lightweight compared with the raw local feature description, meanwhile improves remarkably on several large-scale benchmarks with diversified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Robotics and Sensor-Based Localization
