Bielik v3 Small: Technical Report
Krzysztof Ociepa, {\L}ukasz Flis, Remigiusz Kinas, Krzysztof Wr\'obel, Adrian Gwo\'zdziej

TL;DR
Bielik v3 introduces compact, efficient Polish language models that achieve high performance with fewer resources through innovative tokenization, balanced training, and dynamic learning rate adjustments.
Contribution
The paper presents Bielik v3, a series of parameter-efficient Polish language models with novel training techniques and a custom tokenizer, setting new benchmarks for resource-constrained language AI.
Findings
Models achieve performance comparable to larger counterparts.
The 4.5B model is competitive with models 2-3 times larger.
Strong results on multiple Polish language benchmarks.
Abstract
We introduce Bielik v3, a series of parameter-efficient generative text models (1.5B and 4.5B) optimized for Polish language processing. These models demonstrate that smaller, well-optimized architectures can achieve performance comparable to much larger counterparts while requiring substantially fewer computational resources. Our approach incorporates several key innovations: a custom Polish tokenizer (APT4) that significantly improves token efficiency, Weighted Instruction Cross-Entropy Loss to balance learning across instruction types, and Adaptive Learning Rate that dynamically adjusts based on training progress. Trained on a meticulously curated corpus of 292 billion tokens spanning 303 million documents, these models excel across multiple benchmarks, including the Open PL LLM Leaderboard, Complex Polish Text Understanding Benchmark, Polish EQ-Bench, and Polish Medical Leaderboard.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗speakleash/Bielik-4.5B-v3.0-Instruct-GGUFmodel· 62k dl· ♡ 1662k dl♡ 16
- 🤗speakleash/Bielik-1.5B-v3.0-Instructmodel· 1.1k dl· ♡ 141.1k dl♡ 14
- 🤗speakleash/Bielik-4.5B-v3.0-Instructmodel· 1.1k dl· ♡ 291.1k dl♡ 29
- 🤗speakleash/Bielik-1.5B-v3model· 570 dl· ♡ 4570 dl♡ 4
- 🤗speakleash/Bielik-1.5B-v3.0-Instruct-GGUFmodel· 457 dl· ♡ 6457 dl♡ 6
- 🤗speakleash/Bielik-1.5B-v3.0-Instruct-FP8-Dynamicmodel· 305 dl· ♡ 2305 dl♡ 2
- 🤗speakleash/Bielik-4.5B-v3.0-Instruct-FP8-Dynamicmodel· 389 dl· ♡ 5389 dl♡ 5
- 🤗speakleash/Bielik-4.5B-v3model· 244 dl· ♡ 8244 dl♡ 8
- 🤗adgw/quality_classifier_plmodel· ♡ 4♡ 4
- 🤗speakleash/Bielik-11B-v3.0-Instructmodel· 369k dl· ♡ 56369k dl♡ 56
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
