AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning

Shu Shen; C. L. Philip Chen; Tong Zhang

arXiv:2508.19769·cs.CV·November 4, 2025

AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning

Shu Shen, C. L. Philip Chen, Tong Zhang

PDF

TL;DR

This paper introduces AIM, a novel method for balanced multimodal learning that adaptively modulates network parameters to prevent dominance of certain modalities, leading to improved performance across benchmarks.

Contribution

AIM addresses optimization bias in multimodal learning by decoupling dominant modality parameters and adaptively modulating across network depths, a novel approach for balanced learning.

Findings

01

Outperforms state-of-the-art methods on multiple benchmarks.

02

Demonstrates strong generalizability across backbones, fusion strategies, and optimizers.

03

Effectively balances modality learning without hindering dominant or weak modalities.

Abstract

Multimodal learning has significantly enhanced machine learning performance but still faces numerous challenges and limitations. Imbalanced multimodal learning is one of the problems extensively studied in recent works and is typically mitigated by modulating the learning of each modality. However, we find that these methods typically hinder the dominant modality's learning to promote weaker modalities, which affects overall multimodal performance. We analyze the cause of this issue and highlight a commonly overlooked problem: optimization bias within networks. To address this, we propose Adaptive Intra-Network Modulation (AIM) to improve balanced modality learning. AIM accounts for differences in optimization state across parameters and depths within the network during modulation, achieving balanced multimodal learning without hindering either dominant or weak modalities for the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.