Image Hashing via Cross-View Code Alignment in the Age of Foundation Models
Ilyass Moummad, Kawtar Zaher, Herv\'e Go\"eau, Alexis Joly

TL;DR
CroVCA introduces a simple, efficient hashing method using cross-view code alignment with a lightweight network, achieving state-of-the-art results rapidly across large-scale retrieval benchmarks.
Contribution
A unified, fast hashing approach leveraging cross-view alignment and a lightweight network, enabling quick training and broad applicability in large-scale retrieval tasks.
Findings
State-of-the-art results achieved in just 5 epochs.
Unsupervised hashing on COCO completes in under 2 minutes.
Supervised hashing on ImageNet100 completes in about 3 minutes.
Abstract
Efficient large-scale retrieval requires representations that are both compact and discriminative. Foundation models provide powerful visual and multimodal embeddings, but nearest neighbor search in these high-dimensional spaces is computationally expensive. Hashing offers an efficient alternative by enabling fast Hamming distance search with binary codes, yet existing approaches often rely on complex pipelines, multi-term objectives, designs specialized for a single learning paradigm, and long training times. We introduce CroVCA (Cross-View Code Alignment), a simple and unified principle for learning binary codes that remain consistent across semantically aligned views. A single binary cross-entropy loss enforces alignment, while coding-rate maximization serves as an anti-collapse regularizer to promote balanced and diverse codes. To implement this, we design HashCoder, a lightweight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
