# Collaborative Quantization for Cross-Modal Similarity Search

**Authors:** Ting Zhang, Jingdong Wang

arXiv: 1902.00623 · 2019-02-05

## TL;DR

This paper introduces a novel cross-modal quantization method that jointly learns quantizers for images and texts in a shared space, significantly improving the efficiency and accuracy of cross-modal similarity search.

## Contribution

It is among the first to incorporate quantization into cross-modal search by jointly learning modality-specific quantizers and a shared space for improved retrieval performance.

## Key findings

- Achieves state-of-the-art results on benchmark datasets.
- Demonstrates superior efficiency over existing methods.
- Effectively aligns cross-modal representations for accurate search.

## Abstract

Cross-modal similarity search is a problem about designing a search system supporting querying across content modalities, e.g., using an image to search for texts or using a text to search for images. This paper presents a compact coding solution for efficient search, with a focus on the quantization approach which has already shown the superior performance over the hashing solutions in the single-modal similarity search. We propose a cross-modal quantization approach, which is among the early attempts to introduce quantization into cross-modal search. The major contribution lies in jointly learning the quantizers for both modalities through aligning the quantized representations for each pair of image and text belonging to a document. In addition, our approach simultaneously learns the common space for both modalities in which quantization is conducted to enable efficient and effective search using the Euclidean distance computed in the common space with fast distance table lookup. Experimental results compared with several competitive algorithms over three benchmark datasets demonstrate that the proposed approach achieves the state-of-the-art performance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.00623/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/1902.00623/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/1902.00623/full.md

---
Source: https://tomesphere.com/paper/1902.00623