Descriptor Distillation: a Teacher-Student-Regularized Framework for Learning Local Descriptors
Yuzhen Liu, Qiulei Dong

TL;DR
This paper introduces DesDis, a teacher-student regularized framework for local descriptor learning that improves accuracy and speed by distilling knowledge from pre-trained models, outperforming existing methods.
Contribution
The paper proposes a novel Descriptor Distillation framework that enhances local descriptor learning through teacher-student regularization, leading to better performance and faster inference.
Findings
Student models outperform their teachers in accuracy or speed.
Equal-weight student models achieve better performance than teachers.
Light-weight models are up to 8 times faster with comparable accuracy.
Abstract
Learning a fast and discriminative patch descriptor is a challenging topic in computer vision. Recently, many existing works focus on training various descriptor learning networks by minimizing a triplet loss (or its variants), which is expected to decrease the distance between each positive pair and increase the distance between each negative pair. However, such an expectation has to be lowered due to the non-perfect convergence of network optimizer to a local solution. Addressing this problem and the open computational speed problem, we propose a Descriptor Distillation framework for local descriptor learning, called DesDis, where a student model gains knowledge from a pre-trained teacher model, and it is further enhanced via a designed teacher-student regularizer. This teacher-student regularizer is to constrain the difference between the positive (also negative) pair similarity from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Triplet Loss
