Danoliteracy of Generative Large Language Models
S{\o}ren Vejlgaard Holm, Lars Kai Hansen, Martin Carsten Nielsen

TL;DR
This paper introduces a benchmark to evaluate Danish language and cultural understanding in large language models, revealing high correlation with human feedback and a model consistency factor across diverse scenarios.
Contribution
It presents the first Danish language benchmark for GLLMs, establishing a robust evaluation method and analyzing model consistency in language adaptation.
Findings
GPT-4 and Claude Opus achieve highest rankings
Benchmark correlates with human feedback at 0.8
A strong underlying factor explains 95% of performance variance
Abstract
The language technology moonshot moment of Generative Large Language Models (GLLMs) was not limited to English: These models brought a surge of technological applications, investments, and hype to low-resource languages as well. However, the capabilities of these models in languages such as Danish were, until recently, difficult to verify beyond qualitative demonstrations due to a lack of applicable evaluation corpora. We present a GLLM benchmark to evaluate \emph{Danoliteracy}, a measure of Danish language and cultural competency across eight diverse scenarios such as Danish citizenship tests and abstractive social media question answering. This limited-size benchmark was found to produce a robust ranking that correlates to human feedback at with GPT-4 and Claude Opus models achieving the highest rankings. Analyzing these model results across scenarios, we find one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing
MethodsLinear Layer · Dense Connections · Label Smoothing · Layer Normalization · Residual Connection · Graph Self-Attention · Byte Pair Encoding · Absolute Position Encodings · RAdam · Attention Is All You Need
