Generative Multi-Modal Knowledge Retrieval with Large Language Models

Xinwei Long; Jiali Zeng; Fandong Meng; Zhiyuan Ma; Kaiyan Zhang; Bowen; Zhou; Jie Zhou

arXiv:2401.08206·cs.IR·January 17, 2024·2 cites

Generative Multi-Modal Knowledge Retrieval with Large Language Models

Xinwei Long, Jiali Zeng, Fandong Meng, Zhiyuan Ma, Kaiyan Zhang, Bowen, Zhou, Jie Zhou

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces an end-to-end generative framework leveraging large language models for multi-modal knowledge retrieval, improving effectiveness and training efficiency in handling multi-modal queries.

Contribution

It proposes a novel approach combining object-aware prefix-tuning and knowledge-guided generation to enhance multi-modal knowledge retrieval with LLMs.

Findings

01

Achieved 3.0% to 14.6% improvements on three benchmarks.

02

Effectively aligns multi-grained visual features into textual space.

03

Demonstrates the effectiveness of the proposed framework over strong baselines.

Abstract

Knowledge retrieval with multi-modal queries plays a crucial role in supporting knowledge-intensive multi-modal applications. However, existing methods face challenges in terms of their effectiveness and training efficiency, especially when it comes to training and integrating multiple retrievers to handle multi-modal queries. In this paper, we propose an innovative end-to-end generative framework for multi-modal knowledge retrieval. Our framework takes advantage of the fact that large language models (LLMs) can effectively serve as virtual knowledge bases, even when trained with limited data. We retrieve knowledge via a two-step process: 1) generating knowledge clues related to the queries, and 2) obtaining the relevant document by searching databases using the knowledge clue. In particular, we first introduce an object-aware prefix-tuning technique to guide multi-grained visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xinwei666/mmgenerativeir
pytorchOfficial

Videos

Generative Multi-Modal Knowledge Retrieval with Large Language Models· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Natural Language Processing Techniques

MethodsALIGN