Finding Concept-specific Biases in Form--Meaning Associations

Tiago Pimentel; Brian Roark; S{\o}ren Wichmann; Ryan Cotterell,; Dami\'an Blasi

arXiv:2104.06325·cs.CL·April 30, 2021

Finding Concept-specific Biases in Form--Meaning Associations

Tiago Pimentel, Brian Roark, S{\o}ren Wichmann, Ryan Cotterell,, Dami\'an Blasi

PDF

2 Repos

TL;DR

This paper introduces an information-theoretic method to detect small but significant cross-linguistic form-meaning associations, revealing that a quarter of concepts show notable non-arbitrariness despite overall minor effects.

Contribution

It extends existing methods to measure cross-linguistic non-arbitrariness, providing a scalable approach and new insights into concept-specific biases in language.

Findings

01

Significant cross-linguistic non-arbitrariness exists but is small (less than 0.5%).

02

Approximately 25% of concepts exhibit notable non-arbitrariness.

03

New methods enable large-scale detection of form-meaning associations.

Abstract

This work presents an information-theoretic operationalisation of cross-linguistic non-arbitrariness. It is not a new idea that there are small, cross-linguistic associations between the forms and meanings of words. For instance, it has been claimed (Blasi et al., 2016) that the word for "tongue" is more likely than chance to contain the phone [l]. By controlling for the influence of language family and geographic proximity within a very large concept-aligned, cross-lingual lexicon, we extend methods previously used to detect within language non-arbitrariness (Pimentel et al., 2019) to measure cross-linguistic associations. We find that there is a significant effect of non-arbitrariness, but it is unsurprisingly small (less than 0.5% on average according to our information-theoretic estimate). We also provide a concept-level analysis which shows that a quarter of the concepts considered…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.