Multimodal Multihop Source Retrieval for Web Question Answering
Navya Yarrabelly, Saloni Mittal

TL;DR
This paper introduces a graph reasoning network that leverages semantic sentence structure to improve multi-modal multi-hop question answering by efficiently retrieving supporting facts across images and text, outperforming transformer baselines.
Contribution
The paper presents a novel graph-based approach that enhances multi-modal multi-hop QA by utilizing semantic graph structures and adjacency matrices, reducing reliance on large transformers.
Findings
Graph structure improves retrieval performance.
Message propagation can replace large transformers.
Achieved 4.6% higher retrieval F1 score.
Abstract
This work deals with the challenge of learning and reasoning over multi-modal multi-hop question answering (QA). We propose a graph reasoning network based on the semantic structure of the sentences to learn multi-source reasoning paths and find the supporting facts across both image and text modalities for answering the question. In this paper, we investigate the importance of graph structure for multi-modal multi-hop question answering. Our analysis is centered on WebQA. We construct a strong baseline model, that finds relevant sources using a pairwise classification task. We establish that, with the proper use of feature representations from pre-trained models, graph structure helps in improving multi-modal multi-hop question answering. We point out that both graph structure and adjacency matrix are task-related prior knowledge, and graph structure can be leveraged to improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Web Data Mining and Analysis
