Contrastive Multimodal Learning for Emergence of Graphical Sensory-Motor Communication
Tristan Karch, Yoann Lemesle, Romain Laroche, Cl\'ement Moulin-Frier,, Pierre-Yves Oudeyer

TL;DR
This paper introduces a multimodal contrastive learning framework enabling artificial agents to develop a shared graphical language through a referential game, demonstrating generalization and emergent structure in communication.
Contribution
The paper presents CURVES, a novel contrastive deep learning method for emergent graphical communication in agents, and introduces the GREG game to study language development in sensory-motor contexts.
Findings
Agents successfully communicate using graphical utterances in GREG.
Emergent language generalizes to unseen feature combinations.
Shared lexicon and basic compositional rules are observed in the emergent language.
Abstract
In this paper, we investigate whether artificial agents can develop a shared language in an ecological setting where communication relies on a sensory-motor channel. To this end, we introduce the Graphical Referential Game (GREG) where a speaker must produce a graphical utterance to name a visual referent object while a listener has to select the corresponding object among distractor referents, given the delivered message. The utterances are drawing images produced using dynamical motor primitives combined with a sketching library. To tackle GREG we present CURVES: a multimodal contrastive deep learning mechanism that represents the energy (alignment) between named referents and utterances generated through gradient ascent on the learned energy landscape. We demonstrate that CURVES not only succeeds at solving the GREG but also enables agents to self-organize a language that generalizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Robotics and Automated Systems · Speech and dialogue systems
