An Attention Mechanism for Robust Multimodal Integration in a Global Workspace Architecture

Roland Bertin-Johannet; Lara Scipio; Leopold Mayti\'e; Rufin VanRullen

arXiv:2602.08597·cs.AI·March 31, 2026

An Attention Mechanism for Robust Multimodal Integration in a Global Workspace Architecture

Roland Bertin-Johannet, Lara Scipio, Leopold Mayti\'e, Rufin VanRullen

PDF

TL;DR

This paper introduces a top-down modality selector inspired by Global Workspace Theory that enhances robustness in multimodal systems, especially under noisy or degraded modalities, with fewer parameters and better transferability.

Contribution

It proposes a lightweight, frozen global workspace with a top-down modality selector, improving robustness and transferability over end-to-end attention methods.

Findings

01

Selector improves robustness under structured modality corruptions.

02

Uses fewer trainable parameters than end-to-end attention baselines.

03

Enhances global workspace performance and transferability across tasks and modalities.

Abstract

Robust multimodal systems must remain effective when some modalities are noisy, degraded, or unreliable. Existing multimodal fusion methods often learn modality selection jointly with representation learning, making it difficult to determine whether robustness comes from the selector itself or from full end-to-end co-adaptation. Motivated by Global Workspace Theory (GWT), we study this question using a lightweight top-down modality selector operating on top of a frozen multimodal global workspace. We evaluate our method on two multimodal datasets of increasing complexity: Simple Shapes and MM-IMDb 1.0, under structured modality corruptions. The selector improves robustness while using far fewer trainable parameters than end-to-end attention baselines, and the learned selection strategy transfers better across downstream tasks, corruption regimes, and even to a previously unseen…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.