A Streamlined Attention-Based Network for Descriptor Extraction

Mattia D'Urso; Emanuele Santellani; Christian Sormann; Mattia Rossi; Andreas Kuhn; Friedrich Fraundorfer

arXiv:2601.13126·cs.CV·January 21, 2026

A Streamlined Attention-Based Network for Descriptor Extraction

Mattia D'Urso, Emanuele Santellani, Christian Sormann, Mattia Rossi, Andreas Kuhn, Friedrich Fraundorfer

PDF

Open Access

TL;DR

SANDesc is a lightweight, attention-based neural network for keypoint descriptor extraction that improves matching accuracy across multiple datasets without altering keypoint detection, using an efficient Residual U-Net architecture.

Contribution

The paper introduces SANDesc, a novel attention-enhanced network architecture for descriptor extraction, and a new urban dataset for evaluating feature extractors.

Findings

01

SANDesc outperforms existing descriptors on multiple benchmarks.

02

The model achieves high accuracy with only 2.4 million parameters.

03

The new urban dataset enables comprehensive evaluation of feature extractors.

Abstract

We introduce SANDesc, a Streamlined Attention-Based Network for Descriptor extraction that aims to improve on existing architectures for keypoint description. Our descriptor network learns to compute descriptors that improve matching without modifying the underlying keypoint detector. We employ a revised U-Net-like architecture enhanced with Convolutional Block Attention Modules and residual paths, enabling effective local representation while maintaining computational efficiency. We refer to the building blocks of our model as Residual U-Net Blocks with Attention. The model is trained using a modified triplet loss in combination with a curriculum learning-inspired hard negative mining strategy, which improves training stability. Extensive experiments on HPatches, MegaDepth-1500, and the Image Matching Challenge 2021 show that training SANDesc on top of existing keypoint detectors…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Automated Road and Building Extraction · Multimodal Machine Learning Applications