Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on Swedish

Jenny Kunz

arXiv:2602.03484·cs.CL·February 4, 2026

Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on Swedish

Jenny Kunz

PDF

Open Access 4 Models 1 Datasets

TL;DR

This paper studies how Swedish language models develop and lose preferences for idiomatic language, finding that idiomatic competence emerges slowly and is quickly forgotten when models are fine-tuned on translated data.

Contribution

It introduces novel datasets for assessing idiomaticity in Swedish and demonstrates that idiomatic preferences develop gradually and are fragile during fine-tuning.

Findings

01

Idiomatic competence develops more slowly than grammatical and lexical abilities.

02

Longer training improves idiomatic performance, especially in larger models.

03

Fine-tuning on translated data causes rapid loss of idiomatic preferences.

Abstract

In this study, we investigate how language models develop preferences for \textit{idiomatic} as compared to \textit{linguistically acceptable} Swedish, both during pretraining and when adapting a model from English to Swedish. To do so, we train models on Swedish from scratch and by fine-tuning English-pretrained models, probing their preferences at various checkpoints using minimal pairs that differ in linguistic acceptability or idiomaticity. For linguistic acceptability, we adapt existing benchmarks into a minimal-pair format. To assess idiomaticity, we introduce two novel datasets: one contrasting conventionalized idioms with plausible variants, and another contrasting idiomatic Swedish with Translationese. Our findings suggest that idiomatic competence emerges more slowly than other linguistic abilities, including grammatical and lexical correctness. While longer training yields…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

liu-nlp/swedish-idioms
dataset· 12 dl
12 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling