TL;DR
This paper introduces a novel end-to-end differentiable method for learning local keypoint descriptors using weakly-labeled data, applicable to face matching, video analysis, and 3D shape retrieval, without requiring detailed keypoint correspondences.
Contribution
It proposes a new approach to learn local descriptors from weakly-labeled data, bypassing the need for explicit keypoint matching annotations, and extends to unlabeled videos and 3D models.
Findings
Effective learning from weakly-labeled keypoint pairs
Improved performance with hard negative mining
Versatile application to face matching, videos, and 3D shapes
Abstract
Current best local descriptors are learned on a large dataset of matching and non-matching keypoint pairs. However, data of this kind is not always available since detailed keypoint correspondences can be hard to establish. On the other hand, we can often obtain labels for pairs of keypoint bags. For example, keypoint bags extracted from two images of the same object under different views form a matching pair, and keypoint bags extracted from images of different objects form a non-matching pair. On average, matching pairs should contain more corresponding keypoints than non-matching pairs. We describe an end-to-end differentiable architecture that enables the learning of local keypoint descriptors from such weakly-labeled data. Additionally, we discuss how to improve the method by incorporating the procedure of mining hard negatives. We also show how can our approach be used to learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
