IDK-MRC: Unanswerable Questions for Indonesian Machine Reading   Comprehension

Rifki Afina Putri; Alice Oh

arXiv:2210.13778·cs.CL·October 26, 2022

IDK-MRC: Unanswerable Questions for Indonesian Machine Reading Comprehension

Rifki Afina Putri, Alice Oh

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces IDK-MRC, a new Indonesian MRC dataset with answerable and unanswerable questions, enhancing model performance especially on unanswerable queries in low-resource language settings.

Contribution

The paper presents a novel Indonesian MRC dataset with both answerable and unanswerable questions, created using automated and manual methods to improve low-resource language understanding.

Findings

01

Significant performance improvement on unanswerable questions

02

Dataset contains over 10,000 questions combining answerable and unanswerable types

03

Enhances Indonesian MRC model robustness

Abstract

Machine Reading Comprehension (MRC) has become one of the essential tasks in Natural Language Understanding (NLU) as it is often included in several NLU benchmarks (Liang et al., 2020; Wilie et al., 2020). However, most MRC datasets only have answerable question type, overlooking the importance of unanswerable questions. MRC models trained only on answerable questions will select the span that is most likely to be the answer, even when the answer does not actually exist in the given passage (Rajpurkar et al., 2018). This problem especially remains in medium- to low-resource languages like Indonesian. Existing Indonesian MRC datasets (Purwarianti et al., 2007; Clark et al., 2020) are still inadequate because of the small size and limited question types, i.e., they only cover answerable questions. To fill this gap, we build a new Indonesian MRC dataset called I(n)don'tKnow- MRC (IDK-MRC)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rifkiaputri/idk-mrc
noneOfficial

Datasets

SEACrowd/idk_mrc
dataset· 98 dl
98 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications