Sentence Extraction-Based Machine Reading Comprehension for Vietnamese
Phong Nguyen-Thuan Do, Nhat Duy Nguyen, Tin Van Huynh, Kiet Van, Nguyen, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen

TL;DR
This paper introduces UIT-ViWikiQA, a new Vietnamese dataset for sentence extraction-based machine reading comprehension, along with models and analysis to advance NLP research in Vietnamese.
Contribution
It presents the first Vietnamese dataset for sentence extraction MRC, a conversion algorithm, and evaluates models including XLM-R_Large on this dataset.
Findings
XLM-R_Large achieves 85.97% EM and 88.77% F1 score.
Analysis of question types and context effects on model performance.
Highlights challenges in Vietnamese MRC from the dataset.
Abstract
The development of natural language processing (NLP) in general and machine reading comprehension in particular has attracted the great attention of the research community. In recent years, there are a few datasets for machine reading comprehension tasks in Vietnamese with large sizes, such as UIT-ViQuAD and UIT-ViNewsQA. However, the datasets are not diverse in answers to serve the research. In this paper, we introduce UIT-ViWikiQA, the first dataset for evaluating sentence extraction-based machine reading comprehension in the Vietnamese language. The UIT-ViWikiQA dataset is converted from the UIT-ViQuAD dataset, consisting of comprises 23.074 question-answers based on 5.109 passages of 174 Wikipedia Vietnamese articles. We propose a conversion algorithm to create the dataset for sentence extraction-based machine reading comprehension and three types of approaches for sentence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
