Learning to Fuse: Modality-Aware Adaptive Scheduling for Robust Multimodal Foundation Models

Liam Bennett; Mason Clark; Lucas Anderson; Hana Satou; Olivia Martinez

arXiv:2506.12733·cs.CV·June 17, 2025

Learning to Fuse: Modality-Aware Adaptive Scheduling for Robust Multimodal Foundation Models

Liam Bennett, Mason Clark, Lucas Anderson, Hana Satou, Olivia Martinez

PDF

Open Access

TL;DR

This paper introduces MA-AFS, a dynamic fusion framework for multimodal models that adaptively emphasizes more reliable modalities per instance, improving robustness and generalization across vision-language tasks.

Contribution

It proposes a novel neural scheduler for adaptive modality fusion, integrating entropy and agreement cues, enhancing robustness without significantly increasing model complexity.

Findings

01

Achieves consistent performance improvements over strong baselines.

02

Enhances robustness under modality noise and corruption.

03

Improves generalization under domain shifts.

Abstract

Multimodal foundation models have achieved impressive progress across a wide range of vision-language tasks. However, existing approaches often adopt fixed or task-specific fusion strategies, neglecting the intrinsic variability of modality reliability and sample complexity. In this paper, we propose Modality-Aware Adaptive Fusion Scheduling (MA-AFS), a general framework that learns to dynamically modulate the contribution of each modality on a per-instance basis. MA-AFS introduces a lightweight neural scheduler that predicts modality fusion weights by integrating visual and textual entropy signals along with cross-modal agreement cues. This enables the model to adaptively emphasize more reliable modalities, especially under noisy, missing, or misaligned inputs. We formulate the fusion process as a differentiable scheduling mechanism, analyze its theoretical consistency and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · Multi-Agent Systems and Negotiation