A Collection of Question Answering Datasets for Norwegian
Vladislav Mikhailov, Petter M{\ae}hlum, Victoria Ovedie Chruickshank, Lang{\o}, Erik Velldal, Lilja {\O}vrelid

TL;DR
This paper introduces a comprehensive set of Norwegian question answering datasets covering various knowledge domains and evaluates 11 language models, revealing strengths in Bokmål and challenges in commonsense reasoning and truthfulness.
Contribution
It presents new Norwegian QA datasets in multiple domains and standards, and provides an evaluation of language models' performance on these datasets.
Findings
Models perform better on Bokmål than Nynorsk.
Models struggle with commonsense reasoning.
Models often generate untruthful answers.
Abstract
This paper introduces a new suite of question answering datasets for Norwegian; NorOpenBookQA, NorCommonSenseQA, NorTruthfulQA, and NRK-Quiz-QA. The data covers a wide range of skills and knowledge domains, including world knowledge, commonsense reasoning, truthfulness, and knowledge about Norway. Covering both of the written standards of Norwegian - Bokm{\aa}l and Nynorsk - our datasets comprise over 10k question-answer pairs, created by native speakers. We detail our dataset creation approach and present the results of evaluating 11 language models (LMs) in zero- and few-shot regimes. Most LMs perform better in Bokm{\aa}l than Nynorsk, struggle most with commonsense reasoning, and are often untruthful in generating answers to questions. All our datasets and annotation materials are publicly available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
