A Survey on non-English Question Answering Dataset

Andreas Chandra; Affandy Fahrizain; Ibrahim; Simon Willyanto Laufried

arXiv:2112.13634·cs.CL·December 28, 2021

A Survey on non-English Question Answering Dataset

Andreas Chandra, Affandy Fahrizain, Ibrahim, Simon Willyanto Laufried

PDF

Open Access

TL;DR

This survey reviews existing non-English question answering datasets, resources, and evaluation metrics, highlighting progress and diversity in languages like French, German, Japanese, Chinese, Arabic, and Russian, including multilingual and cross-lingual datasets.

Contribution

It provides a comprehensive summary and analysis of non-English QA datasets, resources, and evaluation methods, filling a gap in existing literature.

Findings

01

Significant growth in non-English QA datasets.

02

Diversity of languages and resource types.

03

Emergence of multilingual and cross-lingual datasets.

Abstract

Research in question answering datasets and models has gained a lot of attention in the research community. Many of them release their own question answering datasets as well as the models. There is tremendous progress that we have seen in this area of research. The aim of this survey is to recognize, summarize and analyze the existing datasets that have been released by many researchers, especially in non-English datasets as well as resources such as research code, and evaluation metrics. In this paper, we review question answering datasets that are available in common languages other than English such as French, German, Japanese, Chinese, Arabic, Russian, as well as the multilingual and cross-lingual question-answering datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Expert finding and Q&A systems