Digging Into Self-Supervised Learning of Feature Descriptors
Iaroslav Melekhov, Zakaria Laskar, Xiaotian Li, Shuzhe Wang, and Juho Kannala

TL;DR
This paper improves self-supervised learning of local image descriptors by expanding negative mining strategies and combining synthetic transformations with visual augmentations, resulting in more robust and discriminative features for geometric tasks.
Contribution
The authors identify limitations in existing self-supervised methods and introduce a set of enhancements, including in-batch hard negative mining and a coarse-to-fine approach, to produce superior feature descriptors.
Findings
Outperforms fully- and weakly-supervised methods on geometric benchmarks.
Increased search space for negative mining improves descriptor quality.
Combining synthetic transformations with visual augmentations enhances invariance.
Abstract
Fully-supervised CNN-based approaches for learning local image descriptors have shown remarkable results in a wide range of geometric tasks. However, most of them require per-pixel ground-truth keypoint correspondence data which is difficult to acquire at scale. To address this challenge, recent weakly- and self-supervised methods can learn feature descriptors from relative camera poses or using only synthetic rigid transformations such as homographies. In this work, we focus on understanding the limitations of existing self-supervised approaches and propose a set of improvements that combined lead to powerful feature descriptors. We show that increasing the search space from in-pair to in-batch for hard negative mining brings consistent improvement. To enhance the discriminativeness of feature descriptors, we propose a coarse-to-fine method for mining local hard negatives from a wider…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
