MULTITAT: Benchmarking Multilingual Table-and-Text Question Answering

Xuanliang Zhang; Dingzirui Wang; Keyan Xu; Qingfu Zhu; Wanxiang Che

arXiv:2502.17253·cs.CL·February 25, 2025

MULTITAT: Benchmarking Multilingual Table-and-Text Question Answering

Xuanliang Zhang, Dingzirui Wang, Keyan Xu, Qingfu Zhu, Wanxiang Che

PDF

Open Access 1 Repo

TL;DR

This paper introduces MULTITAT, the first multilingual dataset for table-and-text question answering, highlighting significant performance gaps in non-English languages and proposing a baseline model that outperforms others.

Contribution

The paper creates MULTITAT, a multilingual TATQA dataset in 10 languages, and develops a baseline model to evaluate and improve multilingual TATQA performance.

Findings

01

Performance drops by 19.4% on non-English data

02

Baseline model outperforms others by 3.3 points on average

03

Multilingual TATQA presents unique challenges and opportunities

Abstract

Question answering on the hybrid context of tables and text (TATQA) is a critical task, with broad applications in data-intensive domains. However, existing TATQA datasets are limited to English, leading to several drawbacks: (i) They overlook the challenges of multilingual TAT-QA and cannot assess model performance in the multilingual setting. (ii) They do not reflect real-world scenarios where tables and texts frequently appear in non-English languages. To address the limitations, we propose the first multilingual TATQA dataset (MULTITAT). Specifically, we sample data from 3 mainstream TATQA datasets and translate it into 10 diverse languages. To align the model TATQA capabilities in English with other languages, we develop a baseline, Ours. Experimental results reveal that the performance on non-English data in MULTITAT drops by an average of 19.4% compared to English, proving the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhxlia/multitat
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Data Quality and Management · Natural Language Processing Techniques

MethodsALIGN