Bielik v3 Small: Technical Report

Krzysztof Ociepa; {\L}ukasz Flis; Remigiusz Kinas; Krzysztof Wr\'obel; Adrian Gwo\'zdziej

arXiv:2505.02550·cs.LG·May 12, 2025

Bielik v3 Small: Technical Report

Krzysztof Ociepa, {\L}ukasz Flis, Remigiusz Kinas, Krzysztof Wr\'obel, Adrian Gwo\'zdziej

PDF

Open Access 10 Models

TL;DR

Bielik v3 introduces compact, efficient Polish language models that achieve high performance with fewer resources through innovative tokenization, balanced training, and dynamic learning rate adjustments.

Contribution

The paper presents Bielik v3, a series of parameter-efficient Polish language models with novel training techniques and a custom tokenizer, setting new benchmarks for resource-constrained language AI.

Findings

01

Models achieve performance comparable to larger counterparts.

02

The 4.5B model is competitive with models 2-3 times larger.

03

Strong results on multiple Polish language benchmarks.

Abstract

We introduce Bielik v3, a series of parameter-efficient generative text models (1.5B and 4.5B) optimized for Polish language processing. These models demonstrate that smaller, well-optimized architectures can achieve performance comparable to much larger counterparts while requiring substantially fewer computational resources. Our approach incorporates several key innovations: a custom Polish tokenizer (APT4) that significantly improves token efficiency, Weighted Instruction Cross-Entropy Loss to balance learning across instruction types, and Adaptive Learning Rate that dynamically adjusts based on training progress. Trained on a meticulously curated corpus of 292 billion tokens spanning 303 million documents, these models excel across multiple benchmarks, including the Open PL LLM Leaderboard, Complex Polish Text Understanding Benchmark, Polish EQ-Bench, and Polish Medical Leaderboard.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification