TL;DR
This paper introduces M-SAM, a novel framework that dynamically modulates gradients based on modality dominance to improve robustness and balance in multimodal learning.
Contribution
It proposes a modality-aware gradient modulation method that enhances multimodal learning by balancing contributions from different modalities during training.
Findings
M-SAM outperforms state-of-the-art methods on four datasets.
It improves the robustness and balance of multimodal models.
M-SAM effectively identifies and emphasizes dominant modalities.
Abstract
In multimodal learning, dominant modalities often overshadow others, limiting generalization. We propose Modality-Aware Sharpness-Aware Minimization (M-SAM), a model-agnostic framework that applies to many modalities and supports early and late fusion scenarios. In every iteration, M-SAM in three steps optimizes learning. \textbf{First, it identifies the dominant modality} based on modalities' contribution in the accuracy using Shapley. \textbf{Second, it decomposes the loss landscape}, or in another language, it modulates the loss to prioritize the robustness of the model in favor of the dominant modality, and \textbf{third, M-SAM updates the weights} by backpropagation of modulated gradients. This ensures robust learning for the dominant modality while enhancing contributions from others, allowing the model to explore and exploit complementary features that strengthen overall…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
