Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on Swedish
Jenny Kunz

TL;DR
This paper studies how Swedish language models develop and lose preferences for idiomatic language, finding that idiomatic competence emerges slowly and is quickly forgotten when models are fine-tuned on translated data.
Contribution
It introduces novel datasets for assessing idiomaticity in Swedish and demonstrates that idiomatic preferences develop gradually and are fragile during fine-tuning.
Findings
Idiomatic competence develops more slowly than grammatical and lexical abilities.
Longer training improves idiomatic performance, especially in larger models.
Fine-tuning on translated data causes rapid loss of idiomatic preferences.
Abstract
In this study, we investigate how language models develop preferences for \textit{idiomatic} as compared to \textit{linguistically acceptable} Swedish, both during pretraining and when adapting a model from English to Swedish. To do so, we train models on Swedish from scratch and by fine-tuning English-pretrained models, probing their preferences at various checkpoints using minimal pairs that differ in linguistic acceptability or idiomaticity. For linguistic acceptability, we adapt existing benchmarks into a minimal-pair format. To assess idiomaticity, we introduce two novel datasets: one contrasting conventionalized idioms with plausible variants, and another contrasting idiomatic Swedish with Translationese. Our findings suggest that idiomatic competence emerges more slowly than other linguistic abilities, including grammatical and lexical correctness. While longer training yields…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
