MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation
Bo Li, Ningyuan Deng, Tianyu Dong, Shaobo Wang, Shaolin Zhu, Lijie Wen

TL;DR
MNAFT is a novel fine-tuning method for multimodal large language models that selectively updates neurons to improve image translation accuracy while preserving pre-trained knowledge.
Contribution
The paper introduces modality neuron-aware fine-tuning (MNAFT), which identifies and selectively updates language-specific and language-agnostic neurons for enhanced image translation.
Findings
MNAFT outperforms state-of-the-art image translation methods on multiple benchmarks.
Selective fine-tuning of neurons preserves pre-trained knowledge and improves generalization.
Neuron activation analysis provides insights into cross-modal understanding.
Abstract
Multimodal large language models (MLLMs) have shown impressive capabilities, yet they often struggle to effectively capture the fine-grained textual information within images crucial for accurate image translation. This often leads to a modality gap between visual text inputs and textual inputs/outputs for image translation. Existing methods, primarily relying on instruction fine-tuning, risk parameter redundancy of pre-trained knowledge, hindering generalization performance. To address this, we introduce modality neuron-aware fine-tuning (MNAFT), a novel approach that takes advantage of the specialized roles of individual neurons within MLLMs for enhanced image translation. MNAFT identifies language-agnostic and language-specific neurons in both vision and language modules through an instruction-driven activation analysis, evaluating their importance in various translation tasks. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
