Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning

Hossein R. Nowdeh; Jie Ji; Xiaolong Ma; Fatemeh Afghah

arXiv:2510.24919·cs.CV·October 30, 2025

Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning

Hossein R. Nowdeh, Jie Ji, Xiaolong Ma, Fatemeh Afghah

PDF

1 Video

TL;DR

This paper introduces M-SAM, a novel framework that dynamically modulates gradients based on modality dominance to improve robustness and balance in multimodal learning.

Contribution

It proposes a modality-aware gradient modulation method that enhances multimodal learning by balancing contributions from different modalities during training.

Findings

01

M-SAM outperforms state-of-the-art methods on four datasets.

02

It improves the robustness and balance of multimodal models.

03

M-SAM effectively identifies and emphasizes dominant modalities.

Abstract

In multimodal learning, dominant modalities often overshadow others, limiting generalization. We propose Modality-Aware Sharpness-Aware Minimization (M-SAM), a model-agnostic framework that applies to many modalities and supports early and late fusion scenarios. In every iteration, M-SAM in three steps optimizes learning. \textbf{First, it identifies the dominant modality} based on modalities' contribution in the accuracy using Shapley. \textbf{Second, it decomposes the loss landscape}, or in another language, it modulates the loss to prioritize the robustness of the model in favor of the dominant modality, and \textbf{third, M-SAM updates the weights} by backpropagation of modulated gradients. This ensures robust learning for the dominant modality while enhancing contributions from others, allowing the model to explore and exploit complementary features that strengthen overall…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning· slideslive