MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering   over Text, Tables and Images

Weihao Liu; Fangyu Lei; Tongxu Luo; Jiahe Lei; Shizhu He; Jun Zhao and; Kang Liu

arXiv:2309.04790·cs.CL·September 12, 2023·2 cites

MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images

Weihao Liu, Fangyu Lei, Tongxu Luo, Jiahe Lei, Shizhu He, Jun Zhao and, Kang Liu

PDF

Open Access

TL;DR

This paper introduces MMHQA-ICL, a novel framework utilizing in-context learning with LLMs for hybrid question answering over text, tables, and images, achieving state-of-the-art results in few-shot settings.

Contribution

It presents the first end-to-end LLM prompting method for multimodal hybrid QA, incorporating a heterogeneous data retriever, image captioning, and type-specific in-context learning strategies.

Findings

01

Outperforms all baselines on MultimodalQA dataset

02

Achieves state-of-the-art results in few-shot learning

03

Demonstrates effectiveness of end-to-end LLM prompting for multimodal QA

Abstract

In the real world, knowledge often exists in a multimodal and heterogeneous form. Addressing the task of question answering with hybrid data types, including text, tables, and images, is a challenging task (MMHQA). Recently, with the rise of large language models (LLM), in-context learning (ICL) has become the most popular way to solve QA problems. We propose MMHQA-ICL framework for addressing this problems, which includes stronger heterogeneous data retriever and an image caption module. Most importantly, we propose a Type-specific In-context Learning Strategy for MMHQA, enabling LLMs to leverage their powerful performance in this task. We are the first to use end-to-end LLM prompting method for this task. Experimental results demonstrate that our framework outperforms all baselines and methods trained on the full dataset, achieving state-of-the-art results under the few-shot setting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning