A Game Interface to Study Semantic Grounding in Text-Based Models
Timothee Mickus, Mathieu Constant, Denis Paperno

TL;DR
This paper introduces an online game to gather human judgments on word similarity across five languages, aiming to evaluate if text-based models can learn grounded representations solely from textual data.
Contribution
It presents a novel game-based data collection method for multilingual human judgments on word similarity to test semantic grounding in language models.
Findings
Initial data collection is underway across five languages.
Early results show variability in human judgments.
The approach provides a new way to evaluate semantic grounding.
Abstract
Can language models learn grounded representations from text distribution alone? This question is both central and recurrent in natural language processing; authors generally agree that grounding requires more than textual distribution. We propose to experimentally test this claim: if any two words have different meanings and yet cannot be distinguished from distribution alone, then grounding is out of the reach of text-based models. To that end, we present early work on an online game for the collection of human judgments on the distributional similarity of word pairs in five languages. We further report early results of our data collection campaign.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Language and cultural evolution · Computational and Text Analysis Methods
