Semi-supervised Multimodal Representation Learning through a Global Workspace
Benjamin Devillers, L\'eopold Mayti\'e, Rufin VanRullen

TL;DR
This paper introduces a global workspace architecture for multimodal learning that aligns and translates between modalities with minimal supervised data, inspired by cognitive theories, and demonstrates its effectiveness across vision-language tasks.
Contribution
The paper proposes a novel global workspace model for multimodal learning that uses self-supervised cycle-consistency, reducing the need for large labeled datasets and improving transfer capabilities.
Findings
Achieves multimodal alignment with 4-7 times less supervised data.
Effective for downstream classification and transfer learning.
Both shared workspace and cycle-consistency are crucial for performance.
Abstract
Recent deep learning models can efficiently combine inputs from different modalities (e.g., images and text) and learn to align their latent representations, or to translate signals from one domain to another (as in image captioning, or text-to-image generation). However, current approaches mainly rely on brute-force supervised training over large multimodal datasets. In contrast, humans (and other animals) can learn useful multimodal representations from only sparse experience with matched cross-modal data. Here we evaluate the capabilities of a neural network architecture inspired by the cognitive notion of a "Global Workspace": a shared representation for two (or more) input modalities. Each modality is processed by a specialized system (pretrained on unimodal data, and subsequently frozen). The corresponding latent representations are then encoded to and decoded from a single shared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Cancer-related molecular mechanisms research · Domain Adaptation and Few-Shot Learning
MethodsALIGN
