Polish phonology and morphology through the lens of distributional semantics
Paula Orzechowska, R. Harald Baayen

TL;DR
This paper explores how Polish words' phonological and morphological structures are reflected in semantic space using distributional semantics and computational models.
Contribution
It demonstrates that semantic vectors encode phonotactic and morphosyntactic information, enabling accurate predictions of linguistic features without explicit form data.
Findings
Phonotactic complexity can be predicted from embeddings.
Semantic vectors encode morphosyntactic categories like tense and case.
Discriminative lexicon models predict comprehension and production accurately.
Abstract
This study investigates the relationship between the phonological and morphological structure of Polish words and their meanings using Distributional Semantics. In the present analysis, we ask whether there is a relationship between the form properties of words containing consonant clusters and their meanings. Is the phonological and morphonological structure of complex words mirrored in semantic space? We address these questions for Polish, a language characterized by non-trivial morphology and an impressive inventory of morphologically-motivated consonant clusters. We use statistical and computational techniques, such as t-SNE, Linear Discriminant Analysis and Linear Discriminative Learning, and demonstrate that -- apart from encoding rich morphosyntactic information (e.g. tense, number, case) -- semantic vectors capture information on sub-lexical linguistic units such as phoneme…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
