Evidential Transformers for Improved Image Retrieval

Danilo Dordevic; Suryansh Kumar

arXiv:2409.01082·cs.CV·September 9, 2025

Evidential Transformers for Improved Image Retrieval

Danilo Dordevic, Suryansh Kumar

PDF

Open Access

TL;DR

This paper presents the Evidential Transformer, a probabilistic model that enhances image retrieval robustness and accuracy by integrating uncertainty estimation and global context, outperforming previous methods on standard datasets.

Contribution

Introduction of the Evidential Transformer with probabilistic methods and global context architecture for improved and reliable image retrieval.

Findings

01

Achieved state-of-the-art results on SOP and CUB-200-2011 datasets.

02

Demonstrated the effectiveness of evidential classification over traditional methods.

03

Established a new benchmark in content-based image retrieval.

Abstract

We introduce the Evidential Transformer, an uncertainty-driven transformer model for improved and robust image retrieval. In this paper, we make several contributions to content-based image retrieval (CBIR). We incorporate probabilistic methods into image retrieval, achieving robust and reliable results, with evidential classification surpassing traditional training based on multiclass classification as a baseline for deep metric learning. Furthermore, we improve the state-of-the-art retrieval results on several datasets by leveraging the Global Context Vision Transformer (GC ViT) architecture. Our experimental results consistently demonstrate the reliability of our approach, setting a new benchmark in CBIR in all test settings on the Stanford Online Products (SOP) and CUB-200-2011 datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques

MethodsByte Pair Encoding · Absolute Position Encodings · Vision Transformer · Softmax · Label Smoothing · Linear Layer · Adam · Dropout · Layer Normalization · Dense Connections