Emergent Communication between Heterogeneous Visual Agents through Decentralized Learning

Mikako Ochiai; Masatoshi Nagano; and Tadahiro Taniguchi

arXiv:2605.11695·cs.CV·May 13, 2026

Emergent Communication between Heterogeneous Visual Agents through Decentralized Learning

Mikako Ochiai, Masatoshi Nagano, and Tadahiro Taniguchi

PDF

TL;DR

This paper investigates how heterogeneous visual agents can develop shared symbols through decentralized learning without shared perception, using a novel game setup and experiments on MS-COCO.

Contribution

It introduces the Metropolis-Hastings Captioning Game to study emergent communication between agents with different visual representations, highlighting the role of perceptual similarity.

Findings

01

Shared token sequences are visually informative and outperform no-communication baselines.

02

Cross-agent alignment decreases with increasing encoder mismatch.

03

Encoder heterogeneity affects the number, specificity, and symmetry of emergent symbols.

Abstract

Symbols are shared, but perception is private. We study emergent communication between heterogeneous visual agents through decentralized learning, asking what visual information can become shareable when agents have different visual representations. Instead of optimizing messages through a shared external communicative objective, our agents exchange only discrete token sequences and update their own models using local perceptual evidence. This setting focuses on an underexplored aspect of emergent communication, examining whether common symbols can arise without shared perceptual access, and how the similarity between private visual spaces constrains the content and symmetry of the resulting language. We instantiate this setting in the Metropolis-Hastings Captioning Game (MHCG), where two agents collaboratively form shared captions by exchanging proposed token sequences that a listener…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.