MiniLingua: A Small Open-Source LLM for European Languages
Anna Aksenova, Boris Zverkov, Nicola Dainese, Alexander Nikitin, Pekka Marttinen

TL;DR
MiniLingua is a small, open-source multilingual language model trained on 13 European languages, achieving strong performance in various NLP tasks while being efficient and suitable for on-device deployment.
Contribution
The paper introduces MiniLingua, a one-billion-parameter multilingual LLM trained from scratch for European languages, with open-source release and superior performance over similar models.
Findings
MiniLingua outperforms EuroLLM in summarization, classification, and QA tasks.
It remains competitive with larger state-of-the-art models in open-ended generation.
Open-source release includes model weights, tokenizer, and training code.
Abstract
Large language models are powerful but often limited by high computational cost, privacy concerns, and English-centric training. Recent progress demonstrates that small, efficient models with around one billion parameters can deliver strong results and enable on-device use. This paper introduces MiniLingua, a multilingual open-source LLM of one billion parameters trained from scratch for 13 European languages, designed to balance coverage and instruction-following capabilities. Based on evaluation results, the instruction-tuned version of MiniLingua outperforms EuroLLM, a model with a similar training approach but a larger training budget, on summarization, classification and both open- and closed-book question answering. Moreover, it remains competitive with more advanced state-of-the-art models on open-ended generation tasks. We release model weights, tokenizer and source code used…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
