Segment Anything Model is a Good Teacher for Local Feature Learning

Jingqian Wu; Rongtao Xu; Zach Wood-Doughty; Changwei Wang; Shibiao Xu; Edmund Y. Lam

arXiv:2309.16992·cs.CV·October 16, 2025·2 cites

Segment Anything Model is a Good Teacher for Local Feature Learning

Jingqian Wu, Rongtao Xu, Zach Wood-Doughty, Changwei Wang, Shibiao Xu, Edmund Y. Lam

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces SAMFeat, a novel local feature learning framework guided by the Segment Anything Model (SAM), which enhances feature description and detection through semantic relation distillation, semantic grouping, and edge attention, achieving superior results in image matching and localization.

Contribution

The paper presents SAMFeat, a new approach that leverages SAM as a teacher for local feature learning, introducing techniques like ASRD, WSC, and EAG to improve performance on limited datasets.

Findings

01

Outperforms previous local features on image matching tasks.

02

Achieves superior results in long-term visual localization.

03

Demonstrates effective semantic-guided feature learning.

Abstract

Local feature detection and description play an important role in many computer vision tasks, which are designed to detect and describe keypoints in "any scene" and "any downstream task". Data-driven local feature learning methods need to rely on pixel-level correspondence for training, which is challenging to acquire at scale, thus hindering further improvements in performance. In this paper, we propose SAMFeat to introduce SAM (segment anything model), a fundamental model trained on 11 million images, as a teacher to guide local feature learning and thus inspire higher performance on limited datasets. To do so, first, we construct an auxiliary task of Attention-weighted Semantic Relation Distillation (ASRD), which distillates feature relations with category-agnostic semantic information learned by the SAM encoder into a local feature learning network, to improve local feature…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. The authors proposed to leverage the foundational segmentation model SAM for local feature learning. As highlighted in the paper, this work is the first one that incorporates SAM for local feature learning by distilling the knowledge from SAM. 2. The authors proposed three techniques to transfer the fine-grained image understanding knowledge from SAM to the proposed local feature learning pipeline, which results in a new local feature detector called SAMFeat. 3. The experimental results on

Weaknesses

1. The proposed method in this work is heuristic and incremental. Though the combination of all three techniques achieves the best performance, it is not clear how each of the heuristic technique improve the local feature learning and further the final performance. I would highly suggest the authors have a deeper study on the proposed techniques on how they are contributing the final performance. 2. It is not clear how much overhead for the training after adding the extra loss functions. For ex

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

(1) The authors integrate the strengths of existing frameworks, effectively utilizing the SAM foundation model and successfully distilling its knowledge into the network for local descriptor learning. It is a good paper for leveraging the knowledge of large models to enhance domain-specific tasks effectively. (2)The article is clearly written in most parts, enabling readers to quickly catch up on the core technical points. The proposed approach is quite reasonable. (3)The experimental results

Weaknesses

(1) My first concern is about the novelty of the paper. It is commendable to leverage SAM to enhance model performance in corresponding tasks. However, acquiring structured information through SAM (PSRD), and using semantic grouping to construct positive and negative samples, thereby introducing contrastive learning, have already been briefly discussed in previous works (SFD2, TPR). From this, the paper is more like an integration of some schemes combined with the SAM model. Hence, its technica

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 5

Strengths

1. This paper exlpored a way to release the power of Segment Anything Model (SAM) for distillation for local features. It shows the potential s of visual foundation models. 2. Experiment-wise, it reaches state of the art results for image matching for with different on HPatches dataset and visual localization task on Archen V1.1 dataset. 3. Authors provide open-source code

Weaknesses

Although I believe in the soundness of the good results that the authors have demonstrated, a major issue that makes me skeptical is whether the contribution and novelty are substantial enough to warrant a full paper. Many of the techniques used in the paper are borrowed from other's implementation. For example, the Pixel Semantic Relational Distillation (PSRD) is to compare two similarity matrix which is a widely used knowledge distillation loss [1]. Then the semantic grouping is from the origi

Code & Models

Repositories

vignywang/samfeat
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Advanced Neural Network Applications

MethodsSegment Anything Model · Contrastive Learning