Visual Words Meet BM25: Sparse Auto-Encoder Visual Word Scoring for Image Retrieval

Donghoon Han; Eunhwan Park; Seunghyeon Seo

arXiv:2603.05781·cs.CV·March 9, 2026

Visual Words Meet BM25: Sparse Auto-Encoder Visual Word Scoring for Image Retrieval

Donghoon Han, Eunhwan Park, Seunghyeon Seo

PDF

Open Access

TL;DR

This paper introduces BM25-V, a sparse visual-word scoring method for image retrieval that leverages BM25's IDF weighting to improve efficiency and interpretability, achieving high recall and effective zero-shot transfer.

Contribution

The paper presents BM25-V, a novel application of BM25 scoring to sparse visual words from an auto-encoder, enhancing retrieval efficiency and interpretability in image retrieval tasks.

Findings

01

Achieves Recall@200 ≥ 0.993 across seven benchmarks.

02

Enables efficient two-stage retrieval with minimal reranking.

03

Zero-shot transfer of the auto-encoder to fine-grained benchmarks.

Abstract

Dense image retrieval is accurate but offers limited interpretability and attribution, and it can be compute-intensive at scale. We present \textbf{BM25-V}, which applies Okapi BM25 scoring to sparse visual-word activations from a Sparse Auto-Encoder (SAE) on Vision Transformer patch features. Across a large gallery, visual-word document frequencies are highly imbalanced and follow a Zipfian-like distribution, making BM25's inverse document frequency (IDF) weighting well suited for suppressing ubiquitous, low-information words and emphasizing rare, discriminative ones. BM25-V retrieves high-recall candidates via sparse inverted-index operations and serves as an efficient first-stage retriever for dense reranking. Across seven benchmarks, BM25-V achieves Recall@200 $\geq$ 0.993, enabling a two-stage pipeline that reranks only $K = 200$ candidates per query and recovers near-dense…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques