Saying the Unsaid: Revealing the Hidden Language of Multimodal Systems Through Telephone Games
Juntu Zhao, Jialing Zhang, Chongxuan Li, Dequan Wang

TL;DR
This paper investigates the hidden language of multimodal systems by using a telephone game approach to analyze their concept understanding and biases, providing insights into their interpretability and generalization.
Contribution
It introduces a novel telephone game framework and dataset to study the hidden concept connections and biases in multimodal systems, enhancing interpretability.
Findings
Identifies specific biases in multimodal systems' concept co-occurrence
Constructs a global map of concept connections through iterative telephone games
Uncovers unexpected concept relationships using Reasoning-LLMs
Abstract
Recent closed-source multimodal systems have made great advances, but their hidden language for understanding the world remains opaque because of their black-box architectures. In this paper, we use the systems' preference bias to study their hidden language: During the process of compressing the input images (typically containing multiple concepts) into texts and then reconstructing them into images, the systems' inherent preference bias introduces specific shifts in the outputs, disrupting the original input concept co-occurrence. We employ the multi-round "telephone game" to strategically leverage this bias. By observing the co-occurrence frequencies of concepts in telephone games, we quantitatively investigate the concept connection strength in the understanding of multimodal systems, i.e., "hidden language." We also contribute Telescope, a dataset of 10,000+ concept pairs, as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Language and cultural evolution
