A Survey on non-English Question Answering Dataset
Andreas Chandra, Affandy Fahrizain, Ibrahim, Simon Willyanto Laufried

TL;DR
This survey reviews existing non-English question answering datasets, resources, and evaluation metrics, highlighting progress and diversity in languages like French, German, Japanese, Chinese, Arabic, and Russian, including multilingual and cross-lingual datasets.
Contribution
It provides a comprehensive summary and analysis of non-English QA datasets, resources, and evaluation methods, filling a gap in existing literature.
Findings
Significant growth in non-English QA datasets.
Diversity of languages and resource types.
Emergence of multilingual and cross-lingual datasets.
Abstract
Research in question answering datasets and models has gained a lot of attention in the research community. Many of them release their own question answering datasets as well as the models. There is tremendous progress that we have seen in this area of research. The aim of this survey is to recognize, summarize and analyze the existing datasets that have been released by many researchers, especially in non-English datasets as well as resources such as research code, and evaluation metrics. In this paper, we review question answering datasets that are available in common languages other than English such as French, German, Japanese, Chinese, Arabic, Russian, as well as the multilingual and cross-lingual question-answering datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Expert finding and Q&A systems
