TL;DR
This paper introduces a grounded, multimodal approach to language typology, quantifying semantic contentfulness across languages and word classes using a novel measure called groundedness, which reveals universal trends and challenges traditional views.
Contribution
It proposes a new groundedness measure based on information theory, applied to multilingual multimodal data, to analyze semantic content in word classes across languages.
Findings
Groundedness captures contentfulness asymmetry between functional and lexical classes.
Universal hierarchy observed: nouns > adjectives > verbs in groundedness.
Groundedness partly correlates with psycholinguistic concreteness norms.
Abstract
We propose a grounded approach to meaning in language typology. We treat data from perceptual modalities, such as images, as a language-agnostic representation of meaning. Hence, we can quantify the function--form relationship between images and captions across languages. Inspired by information theory, we define "groundedness", an empirical measure of contextual semantic contentfulness (formulated as a difference in surprisal) which can be computed with multilingual multimodal language models. As a proof of concept, we apply this measure to the typology of word classes. Our measure captures the contentfulness asymmetry between functional (grammatical) and lexical (content) classes across languages, but contradicts the view that functional classes do not convey content. Moreover, we find universal trends in the hierarchy of groundedness (e.g., nouns > adjectives > verbs), and show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
