Explore How to Inject Beneficial Noise in MLLMs

Ruishu Zhu; Sida Huang; Ziheng Jiao; Hongyuan Zhang

arXiv:2511.12917·cs.CV·November 18, 2025

Explore How to Inject Beneficial Noise in MLLMs

Ruishu Zhu, Sida Huang, Ziheng Jiao, Hongyuan Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel fine-tuning method for Multimodal Large Language Models that injects beneficial noise to improve cross-modal alignment and performance, surpassing traditional fine-tuning techniques with minimal additional parameters.

Contribution

The paper proposes MuNG, a multimodal noise generator that dynamically analyzes cross-modal relationships to inject task-adaptive noise, enhancing MLLMs without full fine-tuning.

Findings

01

Outperforms full fine-tuning and existing methods

02

Requires only 1-2% additional parameters

03

Improves cross-modal representation and downstream task performance

Abstract

Multimodal Large Language Models (MLLMs) have played an increasingly important role in multimodal intelligence. However, the existing fine-tuning methods often ignore cross-modal heterogeneity, limiting their full potential. In this work, we propose a novel fine-tuning strategy by injecting beneficial random noise, which outperforms previous methods and even surpasses full fine-tuning, with minimal additional parameters. The proposed Multimodal Noise Generator (MuNG) enables efficient modality fine-tuning by injecting customized noise into the frozen MLLMs. Specifically, we reformulate the reasoning process of MLLMs from a variational inference perspective, upon which we design a multimodal noise generator that dynamically analyzes cross-modal relationships in image-text pairs to generate task-adaptive beneficial noise. Injecting this type of noise into the MLLMs effectively suppresses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Explore How to Inject Beneficial Noise in MLLMs· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems