AmQA: Amharic Question Answering Dataset

Tilahun Abedissa; Ricardo Usbeck; Yaregal Assabie

arXiv:2303.03290·cs.CL·November 17, 2023·1 cites

AmQA: Amharic Question Answering Dataset

Tilahun Abedissa, Ricardo Usbeck, Yaregal Assabie

PDF

Open Access 2 Datasets

TL;DR

This paper introduces AmQA, the first publicly available Amharic question answering dataset, and provides baseline results to promote research in Amharic NLP.

Contribution

It creates the first Amharic QA dataset with 2628 question-answer pairs and evaluates baseline models to encourage further research.

Findings

01

Baseline F-score of 69.58 in reader-retriever QA

02

Baseline F-score of 71.74 in reading comprehension

03

Dataset to foster Amharic QA research

Abstract

Question Answering (QA) returns concise answers or answer lists from natural language text given a context document. Many resources go into curating QA datasets to advance robust models' development. There is a surge of QA datasets for languages like English, however, this is not true for Amharic. Amharic, the official language of Ethiopia, is the second most spoken Semitic language in the world. There is no published or publicly available Amharic QA dataset. Hence, to foster the research in Amharic QA, we present the first Amharic QA (AmQA) dataset. We crowdsourced 2628 question-answer pairs over 378 Wikipedia articles. Additionally, we run an XLMR Large-based baseline model to spark open-domain QA research interest. The best-performing baseline achieves an F-score of 69.58 and 71.74 in reader-retriever QA and reading comprehension settings respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications