KorQuAD1.0: Korean QA Dataset for Machine Reading Comprehension
Seungyoung Lim, Myungji Kim, Jooyoul Lee

TL;DR
KorQuAD1.0 is a large-scale Korean dataset with over 70,000 question-answer pairs from Wikipedia, designed to advance machine reading comprehension and support multilingual NLP research.
Contribution
It introduces the first large-scale Korean MRC dataset, enabling research in Korean language understanding and multilingual NLP tasks.
Findings
Provides a comprehensive Korean QA dataset with 70,000+ pairs
Facilitates development of Korean MRC models and benchmarks
Encourages multilingual NLP research through a public challenge
Abstract
Machine Reading Comprehension (MRC) is a task that requires machine to understand natural language and answer questions by reading a document. It is the core of automatic response technology such as chatbots and automatized customer supporting systems. We present Korean Question Answering Dataset(KorQuAD), a large-scale Korean dataset for extractive machine reading comprehension task. It consists of 70,000+ human generated question-answer pairs on Korean Wikipedia articles. We release KorQuAD1.0 and launch a challenge at https://KorQuAD.github.io to encourage the development of multilingual natural language processing research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
