TL;DR
This paper introduces an information-theoretic method to detect small but significant cross-linguistic form-meaning associations, revealing that a quarter of concepts show notable non-arbitrariness despite overall minor effects.
Contribution
It extends existing methods to measure cross-linguistic non-arbitrariness, providing a scalable approach and new insights into concept-specific biases in language.
Findings
Significant cross-linguistic non-arbitrariness exists but is small (less than 0.5%).
Approximately 25% of concepts exhibit notable non-arbitrariness.
New methods enable large-scale detection of form-meaning associations.
Abstract
This work presents an information-theoretic operationalisation of cross-linguistic non-arbitrariness. It is not a new idea that there are small, cross-linguistic associations between the forms and meanings of words. For instance, it has been claimed (Blasi et al., 2016) that the word for "tongue" is more likely than chance to contain the phone [l]. By controlling for the influence of language family and geographic proximity within a very large concept-aligned, cross-lingual lexicon, we extend methods previously used to detect within language non-arbitrariness (Pimentel et al., 2019) to measure cross-linguistic associations. We find that there is a significant effect of non-arbitrariness, but it is unsurprisingly small (less than 0.5% on average according to our information-theoretic estimate). We also provide a concept-level analysis which shows that a quarter of the concepts considered…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
