Long-Context Encoder Models for Polish Language Understanding

S{\l}awomir Dadas; Rafa{\l} Po\'swiata; Marek Koz{\l}owski; Ma{\l}gorzata Gr\k{e}bowiec; Micha{\l} Pere{\l}kiewicz; Pawe{\l} Klimiuk; Przemys{\l}aw Boruta

arXiv:2603.12191·cs.CL·March 13, 2026

Long-Context Encoder Models for Polish Language Understanding

S{\l}awomir Dadas, Rafa{\l} Po\'swiata, Marek Koz{\l}owski, Ma{\l}gorzata Gr\k{e}bowiec, Micha{\l} Pere{\l}kiewicz, Pawe{\l} Klimiuk, Przemys{\l}aw Boruta

PDF

Open Access 3 Models

TL;DR

This paper introduces a high-quality Polish encoder model capable of processing up to 8192 tokens, significantly improving long-document understanding while maintaining efficiency and competitive performance on short texts.

Contribution

The paper presents a novel Polish encoder model with extended context window and a two-stage training process, including knowledge distillation for compressed variants.

Findings

01

Achieves state-of-the-art performance on Polish and multilingual long-context tasks.

02

Outperforms competitive models in long-document understanding.

03

Maintains comparable quality on short-text tasks.

Abstract

While decoder-only Large Language Models (LLMs) have recently dominated the NLP landscape, encoder-only architectures remain a cost-effective and parameter-efficient standard for discriminative tasks. However, classic encoders like BERT are limited by a short context window, which is insufficient for processing long documents. In this paper, we address this limitation for the Polish by introducing a high-quality Polish model capable of processing sequences of up to 8192 tokens. The model was developed by employing a two-stage training procedure that involves positional embedding adaptation and full parameter continuous pre-training. Furthermore, we propose compressed model variants trained via knowledge distillation. The models were evaluated on 25 tasks, including the KLEJ benchmark, a newly introduced financial task suite (FinBench), and other classification and regression tasks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification