An Attention Mechanism for Robust Multimodal Integration in a Global Workspace Architecture
Roland Bertin-Johannet, Lara Scipio, Leopold Mayti\'e, Rufin VanRullen

TL;DR
This paper introduces a top-down modality selector inspired by Global Workspace Theory that enhances robustness in multimodal systems, especially under noisy or degraded modalities, with fewer parameters and better transferability.
Contribution
It proposes a lightweight, frozen global workspace with a top-down modality selector, improving robustness and transferability over end-to-end attention methods.
Findings
Selector improves robustness under structured modality corruptions.
Uses fewer trainable parameters than end-to-end attention baselines.
Enhances global workspace performance and transferability across tasks and modalities.
Abstract
Robust multimodal systems must remain effective when some modalities are noisy, degraded, or unreliable. Existing multimodal fusion methods often learn modality selection jointly with representation learning, making it difficult to determine whether robustness comes from the selector itself or from full end-to-end co-adaptation. Motivated by Global Workspace Theory (GWT), we study this question using a lightweight top-down modality selector operating on top of a frozen multimodal global workspace. We evaluate our method on two multimodal datasets of increasing complexity: Simple Shapes and MM-IMDb 1.0, under structured modality corruptions. The selector improves robustness while using far fewer trainable parameters than end-to-end attention baselines, and the learned selection strategy transfers better across downstream tasks, corruption regimes, and even to a previously unseen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
