MUST: An Effective and Scalable Framework for Multimodal Search of   Target Modality

Mengzhao Wang; Xiangyu Ke; Xiaoliang Xu; Lu Chen; Yunjun Gao; Pinpin; Huang; Runkai Zhu

arXiv:2312.06397·cs.DB·December 12, 2023·2 cites

MUST: An Effective and Scalable Framework for Multimodal Search of Target Modality

Mengzhao Wang, Xiangyu Ke, Xiaoliang Xu, Lu Chen, Yunjun Gao, Pinpin, Huang, Runkai Zhu

PDF

Open Access 1 Repo

TL;DR

MUST is a scalable, efficient framework for multimodal search that intelligently fuses multiple modalities using learned weights and a fused proximity graph, significantly improving accuracy and speed over baseline methods.

Contribution

The paper introduces MUST, a novel multimodal search framework that employs hybrid fusion, vector weight learning, and a fused proximity graph for improved accuracy and efficiency.

Findings

01

Achieves over 10x faster search times compared to baselines.

02

Attains an average of 93% higher accuracy in multimodal retrieval.

03

Scales effectively to datasets with over 10 million elements.

Abstract

We investigate the problem of multimodal search of target modality, where the task involves enhancing a query in a specific target modality by integrating information from auxiliary modalities. The goal is to retrieve relevant objects whose contents in the target modality match the specified multimodal query. The paper first introduces two baseline approaches that integrate techniques from the Database, Information Retrieval, and Computer Vision communities. These baselines either merge the results of separate vector searches for each modality or perform a single-channel vector search by fusing all modalities. However, both baselines have limitations in terms of efficiency and accuracy as they fail to adequately consider the varying importance of fusing information across modalities. To overcome these limitations, the paper proposes a novel framework, called MUST. Our framework employs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zju-daily/must
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Remote-Sensing Image Classification