XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based Textual Knowledge Source
Kiet Van Nguyen, Phong Nguyen-Thuan Do, Nhat Duy Nguyen, Tin, Van Huynh, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen

TL;DR
This paper introduces XLMRQA, a Vietnamese open-domain question answering system based on transformer models, which significantly outperforms existing neural network-based QA systems using Wikipedia data.
Contribution
It is the first Vietnamese QA system utilizing a supervised transformer-based reader trained on Wikipedia data, addressing low-resource language challenges.
Findings
XLMRQA outperforms DrQA and BERTserini by 24.46% and 6.28%.
Question types influence QA system performance.
Demonstrates effectiveness of transformer models for Vietnamese QA.
Abstract
Question answering (QA) is a natural language understanding task within the fields of information retrieval and information extraction that has attracted much attention from the computational linguistics and artificial intelligence research community in recent years because of the strong development of machine reading comprehension-based models. A reader-based QA system is a high-level search engine that can find correct answers to queries or questions in open-domain or domain-specific texts using machine reading comprehension (MRC) techniques. The majority of advancements in data resources and machine-learning approaches in the MRC and QA systems especially are developed significantly in two resource-rich languages such as English and Chinese. A low-resource language like Vietnamese has witnessed a scarcity of research on QA systems. This paper presents XLMRQA, the first Vietnamese QA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
