MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation

Bo Li; Ningyuan Deng; Tianyu Dong; Shaobo Wang; Shaolin Zhu; Lijie Wen

arXiv:2604.16943·cs.CL·April 21, 2026

MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation

Bo Li, Ningyuan Deng, Tianyu Dong, Shaobo Wang, Shaolin Zhu, Lijie Wen

PDF

1 Datasets

TL;DR

MNAFT is a novel fine-tuning method for multimodal large language models that selectively updates neurons to improve image translation accuracy while preserving pre-trained knowledge.

Contribution

The paper introduces modality neuron-aware fine-tuning (MNAFT), which identifies and selectively updates language-specific and language-agnostic neurons for enhanced image translation.

Findings

01

MNAFT outperforms state-of-the-art image translation methods on multiple benchmarks.

02

Selective fine-tuning of neurons preserves pre-trained knowledge and improves generalization.

03

Neuron activation analysis provides insights into cross-modal understanding.

Abstract

Multimodal large language models (MLLMs) have shown impressive capabilities, yet they often struggle to effectively capture the fine-grained textual information within images crucial for accurate image translation. This often leads to a modality gap between visual text inputs and textual inputs/outputs for image translation. Existing methods, primarily relying on instruction fine-tuning, risk parameter redundancy of pre-trained knowledge, hindering generalization performance. To address this, we introduce modality neuron-aware fine-tuning (MNAFT), a novel approach that takes advantage of the specialized roles of individual neurons within MLLMs for enhanced image translation. MNAFT identifies language-agnostic and language-specific neurons in both vision and language modules through an instruction-driven activation analysis, evaluating their importance in various translation tasks. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

liboaccn/OPUS-MIT-5M
dataset· 48 dl
48 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.