Multilingual Non-Factoid Question Answering with Answer Paragraph Selection
Ritwik Mishra, Sreeram Vennam, Rajiv Ratn Shah, Ponnurangam Kumaraguru

TL;DR
This paper introduces MuNfQuAD, a large multilingual dataset for non-factoid question answering, and demonstrates that a fine-tuned Answer Paragraph Selection model achieves high accuracy and generalizes well across languages.
Contribution
The creation of MuNfQuAD, the largest multilingual non-factoid QA dataset, and the development of a fine-tuned APS model that outperforms baselines and generalizes across languages.
Findings
The dataset contains over 578K QA pairs across 38 languages.
The APS model achieved 80% accuracy on the test set.
The model effectively generalizes to unseen languages and reduces context length.
Abstract
Most existing Question Answering Datasets (QuADs) primarily focus on factoid-based short-context Question Answering (QA) in high-resource languages. However, the scope of such datasets for low-resource languages remains limited, with only a few works centered on factoid-based QuADs and none on non-factoid QuADs. Therefore, this work presents MuNfQuAD, a multilingual QuAD with non-factoid questions. It utilizes interrogative sub-headings from BBC news articles as questions and the corresponding paragraphs as silver answers. The dataset comprises over 578K QA pairs across 38 languages, encompassing several low-resource languages, and stands as the largest multilingual QA dataset to date. Based on the manual annotations of 790 QA-pairs from MuNfQuAD (golden set), we observe that 98\% of questions can be answered using their corresponding silver answer. Our fine-tuned Answer Paragraph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Speech and dialogue systems
MethodsFocus
