Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models

Zenghui Yuan; Yangming Xu; Jiawen Shi; Pan Zhou; Lichao Sun

arXiv:2505.23561·cs.CR·May 30, 2025

Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models

Zenghui Yuan, Yangming Xu, Jiawen Shi, Pan Zhou, Lichao Sun

PDF

Open Access 1 Video

TL;DR

This paper introduces Merge Hijacking, a novel backdoor attack targeting the model merging process in large language models, demonstrating its effectiveness and robustness against defenses across various models and merging techniques.

Contribution

It presents the first backdoor attack specifically designed for model merging in LLMs, highlighting vulnerabilities and potential security risks.

Findings

01

Attack remains effective across different models and merging algorithms.

02

The backdoor persists even with real-world models and defenses.

03

The method demonstrates high effectiveness and robustness in experiments.

Abstract

Model merging for Large Language Models (LLMs) directly fuses the parameters of different models finetuned on various tasks, creating a unified model for multi-domain tasks. However, due to potential vulnerabilities in models available on open-source platforms, model merging is susceptible to backdoor attacks. In this paper, we propose Merge Hijacking, the first backdoor attack targeting model merging in LLMs. The attacker constructs a malicious upload model and releases it. Once a victim user merges it with any other models, the resulting merged model inherits the backdoor while maintaining utility across tasks. Merge Hijacking defines two main objectives-effectiveness and utility-and achieves them through four steps. Extensive experiments demonstrate the effectiveness of our attack across different models, merging algorithms, and tasks. Additionally, we show that the attack remains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection