An Entailment Tree Generation Approach for Multimodal Multi-Hop Question   Answering with Mixture-of-Experts and Iterative Feedback Mechanism

Qing Zhang; Haocheng Lv; Jie Liu; Zhiyun Chen; Jianyong Duan; Hao; Wang; Li He; Mingying Xv

arXiv:2412.05821·cs.CL·December 11, 2024

An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism

Qing Zhang, Haocheng Lv, Jie Liu, Zhiyun Chen, Jianyong Duan, Hao, Wang, Li He, Mingying Xv

PDF

TL;DR

This paper introduces a novel multimodal multi-hop question answering method that generates entailment trees using a mixture-of-experts framework and iterative feedback, improving interpretability and accuracy.

Contribution

It proposes a joint entailment tree generation and question answering approach with a mixture-of-experts and iterative feedback, addressing redundancy and interpretability issues in multimodal QA.

Findings

01

Achieved first place on WebQA leaderboard.

02

Demonstrated competitive results on MultimodalQA.

03

Enhanced interpretability and accuracy through entailment tree generation.

Abstract

With the rise of large-scale language models (LLMs), it is currently popular and effective to convert multimodal information into text descriptions for multimodal multi-hop question answering. However, we argue that the current methods of multi-modal multi-hop question answering still mainly face two challenges: 1) The retrieved evidence containing a large amount of redundant information, inevitably leads to a significant drop in performance due to irrelevant information misleading the prediction. 2) The reasoning process without interpretable reasoning steps makes the model difficult to discover the logical errors for handling complex questions. To solve these problems, we propose a unified LLMs-based approach but without heavily relying on them due to the LLM's potential errors, and innovatively treat multimodal multi-hop question answering as a joint entailment tree generation and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus