CtD: Composition through Decomposition in Emergent Communication
Boaz Carmeli, Ron Meir, Yonatan Belinkov

TL;DR
This paper introduces a two-step method enabling neural agents to generalize compositionally and describe unseen images, mimicking human-like systematic concept combination, with zero-shot generalization observed in some cases.
Contribution
The study presents a novel 'Composition through Decomposition' approach that allows neural agents to acquire and apply compositional generalization in image description tasks.
Findings
Agents can decompose images into basic concepts using a learned codebook.
Agents can compose concepts to describe new images, achieving zero-shot generalization.
The method demonstrates systematic compositionality in emergent communication.
Abstract
Compositionality is a cognitive mechanism that allows humans to systematically combine known concepts in novel ways. This study demonstrates how artificial neural agents acquire and utilize compositional generalization to describe previously unseen images. Our method, termed "Composition through Decomposition", involves two sequential training steps. In the 'Decompose' step, the agents learn to decompose an image into basic concepts using a codebook acquired during interaction in a multi-target coordination game. Subsequently, in the 'Compose' step, the agents employ this codebook to describe novel images by composing basic concepts into complex phrases. Remarkably, we observe cases where generalization in the `Compose' step is achieved zero-shot, without the need for additional training.
Peer Reviews
Decision·ICLR 2025 Poster
The approach of "Composition through Decomposition" (CtD) method stands out as a two-step approach where agents first learn to decompose complex objects into simpler concepts before recomposing them. It is an interesting way to get agents to perform compositional inference. Using a discrete codebook to handle basic concept representations is well-grounded, showing the potential for generalization without additional training. This idea, inspired by linguistic encoding methods, enables efficient
"Several methods have been proposed and evaluated during recent years by the emergent communication (EC) research community..." -> isn't meta-learning another one to mention here? "Notably, our method achieves extremely high performance, characterized by perfect accuracy and compositionality, on multiple datasets." -> just say it achieves perfect accuracy, no need to say extremely high "we assert that before effectively composing basic concepts into complex ones, agents must acquire the abilit
Originality: Fairly original, builds upon previous work but with somewhat novel contributions Quality: Very pertinent experiments on a wide range of tasks with a wide range of metrics Clarity: Sections 1, 2, 4, 5, 6, and 7 are very clear and well written Significance: Significant contribution to understanding how compositionality can emerge in a learning system, which is an important concept to understand faster and more effective learning
1. It seems to me that in order to properly do the CtD approach, you need to already know the concepts you want to compose ahead of time in order to setup the 2 phases of training. I don't know how realistic this is in real-world settings. 2. I think the paper should make it clear what the novel part of the work is (from what I understand it is only the 2 phases of training), because now it seems like all 3 parts of section 3 are novel Potential additional citation: - https://arxiv.org/pdf/200
- The presentation is good, and the paper is easy to follow. - The experiments are well conducted, covering both toy data and real images. - The conclusions drawn from the experiments are inspiring. (E.g., I like the results in Figure 3.) - The zero-shot results are interesting.
- The paper doesn't mention the multi-generation-based approach for the compositional generalization problem. Actually, the two-stage training proposed here can also be analyzed using similar theoretical tools provided in related works, e.g., [3]. Furthermore, [1] also studies the compositional generalization problem in the context of emergent communication, which is identical to the one studied in this paper. Such an iterated training fashion has proven to be effective in many other fields, e.g
Videos
Taxonomy
TopicsEmbodied and Extended Cognition · Language and cultural evolution · Face Recognition and Perception
