Semi-supervised Multimodal Representation Learning through a Global Workspace

Benjamin Devillers; L\'eopold Mayti\'e; Rufin VanRullen

arXiv:2306.15711·cs.AI·November 27, 2025

Semi-supervised Multimodal Representation Learning through a Global Workspace

Benjamin Devillers, L\'eopold Mayti\'e, Rufin VanRullen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a global workspace architecture for multimodal learning that aligns and translates between modalities with minimal supervised data, inspired by cognitive theories, and demonstrates its effectiveness across vision-language tasks.

Contribution

The paper proposes a novel global workspace model for multimodal learning that uses self-supervised cycle-consistency, reducing the need for large labeled datasets and improving transfer capabilities.

Findings

01

Achieves multimodal alignment with 4-7 times less supervised data.

02

Effective for downstream classification and transfer learning.

03

Both shared workspace and cycle-consistency are crucial for performance.

Abstract

Recent deep learning models can efficiently combine inputs from different modalities (e.g., images and text) and learn to align their latent representations, or to translate signals from one domain to another (as in image captioning, or text-to-image generation). However, current approaches mainly rely on brute-force supervised training over large multimodal datasets. In contrast, humans (and other animals) can learn useful multimodal representations from only sparse experience with matched cross-modal data. Here we evaluate the capabilities of a neural network architecture inspired by the cognitive notion of a "Global Workspace": a shared representation for two (or more) input modalities. Each modality is processed by a specialized system (pretrained on unimodal data, and subsequently frozen). The corresponding latent representations are then encoded to and decoded from a single shared…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bdvllrs/bimgw
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Cancer-related molecular mechanisms research · Domain Adaptation and Few-Shot Learning

MethodsALIGN