Polish phonology and morphology through the lens of distributional semantics

Paula Orzechowska; R. Harald Baayen

arXiv:2604.00174·cs.CL·April 2, 2026

Polish phonology and morphology through the lens of distributional semantics

Paula Orzechowska, R. Harald Baayen

PDF

TL;DR

This paper explores how Polish words' phonological and morphological structures are reflected in semantic space using distributional semantics and computational models.

Contribution

It demonstrates that semantic vectors encode phonotactic and morphosyntactic information, enabling accurate predictions of linguistic features without explicit form data.

Findings

01

Phonotactic complexity can be predicted from embeddings.

02

Semantic vectors encode morphosyntactic categories like tense and case.

03

Discriminative lexicon models predict comprehension and production accurately.

Abstract

This study investigates the relationship between the phonological and morphological structure of Polish words and their meanings using Distributional Semantics. In the present analysis, we ask whether there is a relationship between the form properties of words containing consonant clusters and their meanings. Is the phonological and morphonological structure of complex words mirrored in semantic space? We address these questions for Polish, a language characterized by non-trivial morphology and an impressive inventory of morphologically-motivated consonant clusters. We use statistical and computational techniques, such as t-SNE, Linear Discriminant Analysis and Linear Discriminative Learning, and demonstrate that -- apart from encoding rich morphosyntactic information (e.g. tense, number, case) -- semantic vectors capture information on sub-lexical linguistic units such as phoneme…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.