NCTB-QA: A Large-Scale Bangla Educational Question Answering Dataset and Benchmarking Performance
Abrar Eyasir, Tahsin Ahmed, Muhammad Ibrahim

TL;DR
This paper introduces NCTB-QA, a large-scale Bangla educational question answering dataset with balanced answerable and unanswerable questions, and benchmarks transformer models to improve robustness in low-resource language settings.
Contribution
It presents the first large-scale, balanced Bangla QA dataset with adversarial examples and benchmarks multiple transformer models, highlighting the importance of domain-specific fine-tuning.
Findings
BERT achieves 0.620 F1 score, a 313% relative improvement.
All models show significant gains in semantic answer quality.
NCTB-QA is a challenging benchmark for Bangla educational QA.
Abstract
Reading comprehension systems for low-resource languages face significant challenges in handling unanswerable questions. These systems tend to produce unreliable responses when correct answers are absent from context. To solve this problem, we introduce NCTB-QA, a large-scale Bangla question answering dataset comprising 87,805 question-answer pairs extracted from 50 textbooks published by Bangladesh's National Curriculum and Textbook Board. Unlike existing Bangla datasets, NCTB-QA maintains a balanced distribution of answerable (57.25%) and unanswerable (42.75%) questions. NCTB-QA also includes adversarially designed instances containing plausible distractors. We benchmark three transformer-based models (BERT, RoBERTa, ELECTRA) and demonstrate substantial improvements through fine-tuning. BERT achieves 313% relative improvement in F1 score (0.150 to 0.620). Semantic answer quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
