ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval

Le Dong; Qixuan Cao; Lei Pu; Fangfang Wu; Weisheng Dong; Xin Li; Guangming Shi

arXiv:2412.18136·cs.CV·February 10, 2026

ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval

Le Dong, Qixuan Cao, Lei Pu, Fangfang Wu, Weisheng Dong, Xin Li, Guangming Shi

PDF

Open Access 1 Repo

TL;DR

This paper introduces ERVD, a novel framework utilizing Vision Transformer (ViT) for efficient and robust remote sensing image retrieval, improving accuracy and computational efficiency over existing methods.

Contribution

The paper proposes a new ViT-based distillation framework specifically designed for remote sensing image retrieval, enhancing robustness and efficiency.

Findings

01

Achieves higher retrieval accuracy compared to baseline methods

02

Reduces computational complexity in image retrieval tasks

03

Demonstrates robustness across diverse remote sensing datasets

Abstract

ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

milkyfun0/ERVD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Data Management and Algorithms

MethodsLinear Layer · Softmax · Layer Normalization · Residual Connection · Attention Is All You Need · Dense Connections · Multi-Head Attention · Vision Transformer