A Data-Driven Approach to Idiomaticity Based on Experts' Criteria in Theoretical Linguistics
Elena Mikhalkova, Anastasiya Vishnyakova, Anastasiya Drozdova, Polina Gavin, Aleksander Zhmykhov, Timofey Protasov

TL;DR
This study analyzes 286 multi-word expressions using expert-annotated criteria to understand idiomaticity, revealing no expression is entirely idiomatic and highlighting the influence of lexical and grammatical factors.
Contribution
It provides a data-driven analysis of idiomaticity based on expert annotations, emphasizing the nuanced influence of lexical and grammatical criteria.
Findings
No expression is completely idiomatic.
Lexical criteria are most influential.
Obsolete words affect replaceability.
Abstract
The article observes data analysis of 286 multi-word expressions (MWEs) based on 16 lexical, grammatical and other criteria described in theoretical books and papers on the notion of idiomaticity. MWEs were collected from the same theoretical sources, and a set of experts in linguistics annotated them with these categories. The distribution of categories shows that there are no absolutely idiomatic expressions. Lexical criteria seem to be the most influential; grammatical criteria are bound to certain conditions; presence of obsolete words and grammar influence ability of an MWE to be replaced with one word.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
