Progressive Evidence Refinement for Open-domain Multimodal Retrieval Question Answering
Shuwen Yang, Anran Wu, Xingjiao Wu, Luwei Xiao, Tianlong Ma, Cheng, Jin, Liang He

TL;DR
This paper introduces a progressive evidence refinement framework with semi-supervised contrastive learning and multi-turn retrieval for open-domain multimodal question answering, significantly improving evidence selection and answer accuracy.
Contribution
It proposes a novel two-stage evidence refinement and multi-turn retrieval framework with contrastive learning to better utilize fine-grained multimodal evidence in QA tasks.
Findings
Achieved state-of-the-art results on WebQA and MultimodelQA benchmarks.
Demonstrated improved evidence selection and answer accuracy.
Validated effectiveness of the multi-turn retrieval strategy.
Abstract
Pre-trained multimodal models have achieved significant success in retrieval-based question answering. However, current multimodal retrieval question-answering models face two main challenges. Firstly, utilizing compressed evidence features as input to the model results in the loss of fine-grained information within the evidence. Secondly, a gap exists between the feature extraction of evidence and the question, which hinders the model from effectively extracting critical features from the evidence based on the given question. We propose a two-stage framework for evidence retrieval and question-answering to alleviate these issues. First and foremost, we propose a progressive evidence refinement strategy for selecting crucial evidence. This strategy employs an iterative evidence retrieval approach to uncover the logical sequence among the evidence pieces. It incorporates two rounds of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsContrastive Learning
