TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval
Yongbiao Chen (1), Sheng Zhang (2), Fangxin Liu (1), Zhigang Chang, (1), Mang Ye (3), Zhengwei Qi (1) ((1) Shanghai Jiao Tong University, (2), University of Southern California, (3) Wuhan University)

TL;DR
TransHash introduces a novel transformer-based deep hashing framework for efficient image retrieval, outperforming CNN-based methods by leveraging vision transformers and Bayesian learning for superior accuracy.
Contribution
This work is the first to develop a pure transformer-based deep hashing method for image retrieval, eliminating the need for convolutional neural networks.
Findings
Achieves significant performance gains over state-of-the-art methods.
Demonstrates effectiveness on CIFAR-10, NUSWIDE, and ImageNet datasets.
Outperforms existing methods in mean Average Precision (mAP).
Abstract
Deep hamming hashing has gained growing popularity in approximate nearest neighbour search for large-scale image retrieval. Until now, the deep hashing for the image retrieval community has been dominated by convolutional neural network architectures, e.g. \texttt{Resnet}\cite{he2016deep}. In this paper, inspired by the recent advancements of vision transformers, we present \textbf{Transhash}, a pure transformer-based framework for deep hashing learning. Concretely, our framework is composed of two major modules: (1) Based on \textit{Vision Transformer} (ViT), we design a siamese vision transformer backbone for image feature extraction. To learn fine-grained features, we innovate a dual-stream feature learning on top of the transformer to learn discriminative global and local features. (2) Besides, we adopt a Bayesian learning scheme with a dynamically constructed similarity matrix to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Image Retrieval and Classification Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · Softmax · Dense Connections · Vision Transformer
