Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual   Question Answering

Alireza Salemi; Mahta Rafiee; Hamed Zamani

arXiv:2306.16478·cs.IR·June 30, 2023

Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering

Alireza Salemi, Mahta Rafiee, Hamed Zamani

PDF

Open Access 1 Repo

TL;DR

This paper introduces a pre-training method for multi-modal dense retrievers in outside-knowledge visual question answering, significantly improving retrieval accuracy and zero-shot performance.

Contribution

It proposes an automatic data generation pipeline for pre-training passage retrieval models, enhancing retrieval effectiveness in OK-VQA tasks.

Findings

01

26.9% improvement in Precision@5 over state-of-the-art

02

Effective zero-shot retrieval performance

03

Enhanced retrieval accuracy for multi-modal queries

Abstract

This paper studies a category of visual question answering tasks, in which accessing external knowledge is necessary for answering the questions. This category is called outside-knowledge visual question answering (OK-VQA). A major step in developing OK-VQA systems is to retrieve relevant documents for the given multi-modal query. Current state-of-the-art asymmetric dense retrieval model for this task uses an architecture with a multi-modal query encoder and a uni-modal document encoder. Such an architecture requires a large amount of training data for effective performance. We propose an automatic data generation pipeline for pre-training passage retrieval models for OK-VQA tasks. The proposed approach leads to 26.9% Precision@5 improvements compared to the current state-of-the-art asymmetric architecture. Additionally, the proposed pre-training approach exhibits a good ability in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alirezasalemi7/pretraining-multimodal-dense-retriever-for-okvqa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning