Vision Transformer Hashing for Image Retrieval

Shiv Ram Dubey; Satish Kumar Singh; Wei-Ta Chu

arXiv:2109.12564·cs.CV·March 23, 2022

Vision Transformer Hashing for Image Retrieval

Shiv Ram Dubey, Satish Kumar Singh, Wei-Ta Chu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Vision Transformer based hashing method (VTS) for image retrieval, leveraging pre-trained ViT and fine-tuning it across multiple frameworks, resulting in superior performance over existing hashing techniques.

Contribution

The paper proposes a novel Vision Transformer based hashing approach that outperforms state-of-the-art methods and demonstrates the effectiveness of ViT as a backbone for image retrieval hashing.

Findings

01

VTS outperforms recent hashing techniques on multiple datasets.

02

VTS backbone surpasses AlexNet and ResNet in retrieval tasks.

03

Extensive experiments validate the effectiveness of the proposed method.

Abstract

Deep learning has shown a tremendous growth in hashing techniques for image retrieval. Recently, Transformer has emerged as a new architecture by utilizing self-attention without convolution. Transformer is also extended to Vision Transformer (ViT) for the visual recognition with a promising performance on ImageNet. In this paper, we propose a Vision Transformer based Hashing (VTS) for image retrieval. We utilize the pre-trained ViT on ImageNet as the backbone network and add the hashing head. The proposed VTS model is fine tuned for hashing under six different image retrieval frameworks, including Deep Supervised Hashing (DSH), HashNet, GreedyHash, Improved Deep Hashing Network (IDHN), Deep Polarized Network (DPN) and Central Similarity Quantization (CSQ) with their objective functions. We perform the extensive experiments on CIFAR10, ImageNet, NUS-Wide, and COCO datasets. The proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shivram1987/visiontransformerhashing
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Video Surveillance and Tracking Methods

MethodsAttention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · Linear Layer · 1x1 Convolution · Batch Normalization · Average Pooling · Max Pooling · Residual Block · Bottleneck Residual Block · Dropout