CT2C-QA: Multimodal Question Answering over Chinese Text, Table and   Chart

Bowen Zhao; Tianhao Cheng; Yuejie Zhang; Ying Cheng; Rui Feng; Xiaobo; Zhang

arXiv:2410.21414·cs.CL·October 30, 2024

CT2C-QA: Multimodal Question Answering over Chinese Text, Table and Chart

Bowen Zhao, Tianhao Cheng, Yuejie Zhang, Ying Cheng, Rui Feng, Xiaobo, Zhang

PDF

TL;DR

This paper introduces CT2C-QA, a comprehensive Chinese multimodal QA dataset combining text, tables, and charts, along with a multi-agent system called AED for reasoning, highlighting current models' limitations in handling such complex data.

Contribution

The paper presents the first Chinese multimodal QA dataset with text, tables, and charts, and proposes a multi-agent reasoning system, AED, to improve analysis and decision-making.

Findings

01

Current models, including GPT-4, underperform on the dataset.

02

The AED system outperforms existing models in multimodal reasoning tasks.

03

The dataset effectively tests models' ability to analyze diverse data modalities.

Abstract

Multimodal Question Answering (MMQA) is crucial as it enables comprehensive understanding and accurate responses by integrating insights from diverse data representations such as tables, charts, and text. Most existing researches in MMQA only focus on two modalities such as image-text QA, table-text QA and chart-text QA, and there remains a notable scarcity in studies that investigate the joint analysis of text, tables, and charts. In this paper, we present C $T^{2}$ C-QA, a pioneering Chinese reasoning-based QA dataset that includes an extensive collection of text, tables, and charts, meticulously compiled from 200 selectively sourced webpages. Our dataset simulates real webpages and serves as a great test for the capability of the model to analyze and reason with multimodal data, because the answer to a question could appear in various modalities, or even potentially not exist at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Layer Normalization · Residual Connection · Byte Pair Encoding · Absolute Position Encodings · Multi-Head Attention · Softmax