TildeOpen LLM: Leveraging Curriculum Learning to Achieve Equitable Language Representation
Toms Bergmanis, Martins Kronis, Ingus J\=anis Pretkalni\c{n}\v{s}, D\=avis Nicmanis, Je\c{l}izaveta Jelinska, Roberts Rozis, Rinalds V\=iksna, M\=arcis Pinnis

TL;DR
TildeOpen LLM is a 30-billion-parameter multilingual model trained on 34 European languages, using curriculum learning and data balancing to improve performance in low-resource languages, and is openly available.
Contribution
The paper introduces a novel training strategy combining dataset upsampling and curriculum scheduling to enhance multilingual language model performance.
Findings
Outperforms existing open multilingual models in text generation and comprehension.
Achieves up to tenfold reduction in linguistic errors for low-resource languages.
Demonstrates effective multilingual performance with fewer training resources.
Abstract
Large language models often underperform in many European languages due to the dominance of English and a few high-resource languages in training data. This paper presents TildeOpen LLM, a 30-billion-parameter open-weight foundational model trained for 34 European languages to promote linguistic equity and improve performance for low-resource languages. To address the data imbalance, we combine dataset upsampling with a curriculum-based training schedule that alternates between uniform and natural language distributions. The resulting model performs favorably compared to other multilingual LLMs despite being trained with significantly fewer computing resources. Evaluation across multiple multilingual benchmarks shows that TildeOpen surpasses existing open-weight models in text generation and comprehension, particularly for Baltic, Finno-Ugric, and Slavic languages. Human evaluations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
