LF-Net: Learning Local Features from Images
Yuki Ono, Eduard Trulls, Pascal Fua, Kwang Moo Yi

TL;DR
LF-Net introduces a deep learning approach to learn local image features from collections of images without human supervision, leveraging depth and pose cues to outperform existing methods in feature matching at high speed.
Contribution
It proposes a novel deep architecture and training strategy that learns local features from images without manual labels, using depth and pose information for supervision.
Findings
Outperforms state-of-the-art on sparse feature matching
Operates at over 60 fps for QVGA images
Effective on both indoor and outdoor datasets
Abstract
We present a novel deep architecture and a training strategy to learn a local feature pipeline from scratch, using collections of images without the need for human supervision. To do so we exploit depth and relative camera pose cues to create a virtual target that the network should achieve on one image, provided the outputs of the network for the other image. While this process is inherently non-differentiable, we show that we can optimize the network in a two-branch setup by confining it to one branch, while preserving differentiability in the other. We train our method on both indoor and outdoor datasets, with depth data from 3D sensors for the former, and depth estimates from an off-the-shelf Structure-from-Motion solution for the latter. Our models outperform the state of the art on sparse feature matching on both datasets, while running at 60+ fps for QVGA images.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
