Towards Multilingual LLM Evaluation for Baltic and Nordic languages: A study on Lithuanian History
Yevhen Kostiuk, Oxana Vitman, {\L}ukasz Gaga{\l}a, Artur Kiulian

TL;DR
This study evaluates multilingual LLMs on Lithuanian history questions across Baltic, Nordic, and other languages, revealing GPT-4o's superior performance and highlighting challenges in cultural language alignment.
Contribution
It provides a comprehensive evaluation of various multilingual LLMs on Lithuanian history, including the impact of cultural and linguistic factors on model performance.
Findings
GPT-4o outperforms other models across languages.
Larger models perform better but show weaker Baltic language alignment.
Nordic fine-tuned models do not outperform general multilingual models.
Abstract
In this work, we evaluated Lithuanian and general history knowledge of multilingual Large Language Models (LLMs) on a multiple-choice question-answering task. The models were tested on a dataset of Lithuanian national and general history questions translated into Baltic, Nordic, and other languages (English, Ukrainian, Arabic) to assess the knowledge sharing from culturally and historically connected groups. We evaluated GPT-4o, LLaMa3.1 8b and 70b, QWEN2.5 7b and 72b, Mistral Nemo 12b, LLaMa3 8b, Mistral 7b, LLaMa3.2 3b, and Nordic fine-tuned models (GPT-SW3 and LLaMa3 8b). Our results show that GPT-4o consistently outperformed all other models across language groups, with slightly better results for Baltic and Nordic languages. Larger open-source models like QWEN2.5 72b and LLaMa3.1 70b performed well but showed weaker alignment with Baltic languages. Smaller models (Mistral Nemo…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · linguistics and terminology studies
