Dual-Priv Pruning : Efficient Differential Private Fine-Tuning in Multimodal Large Language Models
Qianshan Wei, Jiaqi Li, Zihan You, Yi Zhan, Kecen Li, Jialin Wu, Xinfeng Li Hengjun Liu, Yi Yu, Bin Cao, Yiwen Xu, Yang Liu, Guilin Qi

TL;DR
This paper introduces Dual-Priv Pruning, a novel framework for efficient differential private fine-tuning of multimodal large language models, reducing computation and privacy noise impact while maintaining competitive performance.
Contribution
It proposes a dual pruning approach combining visual token and gradient-update pruning to enhance privacy-utility trade-offs in DP fine-tuning of MLLMs.
Findings
Achieves competitive results with minimal performance loss.
Uses less memory than standard DP-SGD methods.
First to explore DP fine-tuning in MLLMs.
Abstract
Differential Privacy (DP) is a widely adopted technique, valued for its effectiveness in protecting the privacy of task-specific datasets, making it a critical tool for large language models. However, its effectiveness in Multimodal Large Language Models (MLLMs) remains uncertain. Applying Differential Privacy (DP) inherently introduces substantial computation overhead, a concern particularly relevant for MLLMs which process extensive textual and visual data. Furthermore, a critical challenge of DP is that the injected noise, necessary for privacy, scales with parameter dimensionality, leading to pronounced model degradation; This trade-off between privacy and utility complicates the application of Differential Privacy (DP) to complex architectures like MLLMs. To address these, we propose Dual-Priv Pruning, a framework that employs two complementary pruning mechanisms for DP fine-tuning…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The motivation behind the problem is clear: MLLM computation is computationally expensive. Differential privacy also suffers from reduced utility as dimensionality increases. 2. The method design is simple and relatively easy to implement. 3. The experiments and ablation studies are relatively comprehensive.
1. It is unclear whether importance scores should be used as the evidence for discarding tokens. This seems to be a common technique used in engineering, and the author also seems to have demonstrated the significance of discarding them. However, simply discarding tokens based on the importance of attention seems to lack rigorous justification. 2. It's unclear why the author conducted accuracy experiments on the Q&A dataset: the author proposed a new method for differential privacy, but testing
- First paper to consider private training of VLLMs, opening up a new avenue for research - Paper proposes some interesting techniques for improving the utility of differentially private training when working with high-dimensional data. These techniques might be useful in other settings for differentially private traning beyond VLLMs - Comprehehensive evaluation with open source code - Consistent improvement over the baseline method. - Great, easy to follow presentation
The first technique, which reduces dimensionality of the input image by selecting the most relevant tokens and fuses the remaining tokens using an averaging+clustering method, does not have a differential privacy guarantee. Instead noise is added to the fused tokens heuristically. Unless I am missing something, the E2E algorithm is not technically differentially private and I think this should be emphasized further in the limitations/intro. I agree that for practical privacy guarantees and as
1. This paper address an important challenge of privacy utility trade off in a multimodal LLM fine-tuning setup. 2. The proposed method employs two levels of pruning, one for each a) reducing memory overhead, and b) reducing the impact of DP noise on utility. 3. The experimental results show the improvements in accuracy resulted by utilizing dual-priv pruning as compared to DP-SGD (first order DP fine-tuning) and DPZO (zeroth order DP fine-tuning) for various benchmarks and privacy settings. 4.
The proposed mechanism introduces additional hyper-parameters that need to tuned such as 1) selected layers of the vision encoder for computing the importance scores, and 2) K and |C| values for pruning in step-1 (token selection) and step-2 (gradient pruning). Tuning these parameters to get reasonable trade-offs can introduce heavy computational overhead.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Advanced Neural Network Applications · Topic Modeling
MethodsPruning
