SwaQuAD-24: QA Benchmark Dataset in Swahili

Alfred Malengo Kondoro

arXiv:2410.14289·cs.CL·October 21, 2024

SwaQuAD-24: QA Benchmark Dataset in Swahili

Alfred Malengo Kondoro

PDF

Open Access

TL;DR

This paper introduces SwaQuAD-24, a comprehensive Swahili QA benchmark dataset designed to advance NLP research and applications for the low-resource Swahili language, inspired by established benchmarks.

Contribution

It presents the creation of a high-quality, annotated Swahili QA dataset, addressing language underrepresentation and supporting diverse NLP tasks.

Findings

01

Dataset includes diverse, annotated question-answer pairs

02

Supports applications like translation and chatbots

03

Aims to foster NLP innovation in East Africa

Abstract

This paper proposes the creation of a Swahili Question Answering (QA) benchmark dataset, aimed at addressing the underrepresentation of Swahili in natural language processing (NLP). Drawing from established benchmarks like SQuAD, GLUE, KenSwQuAD, and KLUE, the dataset will focus on providing high-quality, annotated question-answer pairs that capture the linguistic diversity and complexity of Swahili. The dataset is designed to support a variety of applications, including machine translation, information retrieval, and social services like healthcare chatbots. Ethical considerations, such as data privacy, bias mitigation, and inclusivity, are central to the dataset development. Additionally, the paper outlines future expansion plans to include domain-specific content, multimodal integration, and broader crowdsourcing efforts. The Swahili QA dataset aims to foster technological innovation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsFocus