MiniLingua: A Small Open-Source LLM for European Languages

Anna Aksenova; Boris Zverkov; Nicola Dainese; Alexander Nikitin; Pekka Marttinen

arXiv:2512.13298·cs.CL·December 16, 2025

MiniLingua: A Small Open-Source LLM for European Languages

Anna Aksenova, Boris Zverkov, Nicola Dainese, Alexander Nikitin, Pekka Marttinen

PDF

Open Access

TL;DR

MiniLingua is a small, open-source multilingual language model trained on 13 European languages, achieving strong performance in various NLP tasks while being efficient and suitable for on-device deployment.

Contribution

The paper introduces MiniLingua, a one-billion-parameter multilingual LLM trained from scratch for European languages, with open-source release and superior performance over similar models.

Findings

01

MiniLingua outperforms EuroLLM in summarization, classification, and QA tasks.

02

It remains competitive with larger state-of-the-art models in open-ended generation.

03

Open-source release includes model weights, tokenizer, and training code.

Abstract

Large language models are powerful but often limited by high computational cost, privacy concerns, and English-centric training. Recent progress demonstrates that small, efficient models with around one billion parameters can deliver strong results and enable on-device use. This paper introduces MiniLingua, a multilingual open-source LLM of one billion parameters trained from scratch for 13 European languages, designed to balance coverage and instruction-following capabilities. Based on evaluation results, the instruction-tuned version of MiniLingua outperforms EuroLLM, a model with a similar training approach but a larger training budget, on summarization, classification and both open- and closed-book question answering. Moreover, it remains competitive with more advanced state-of-the-art models on open-ended generation tasks. We release model weights, tokenizer and source code used…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification