Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model Merging
Lin Lu, Zhigang Zuo, Ziji Sheng, Pan Zhou

TL;DR
This paper reveals a novel attack method called Merger-as-a-Stealer that exploits model merging to extract targeted personally identifiable information from aligned large language models, highlighting security vulnerabilities.
Contribution
It introduces a two-stage attack framework demonstrating how malicious model merging can be used to steal targeted PII from large language models.
Findings
Successful extraction of targeted PII across various models
Effective attack against multiple model merging methods
Highlights need for improved model security and defenses
Abstract
Model merging has emerged as a promising approach for updating large language models (LLMs) by integrating multiple domain-specific models into a cross-domain merged model. Despite its utility and plug-and-play nature, unmonitored mergers can introduce significant security vulnerabilities, such as backdoor attacks and model merging abuse. In this paper, we identify a novel and more realistic attack surface where a malicious merger can extract targeted personally identifiable information (PII) from an aligned model with model merging. Specifically, we propose \texttt{Merger-as-a-Stealer}, a two-stage framework to achieve this attack: First, the attacker fine-tunes a malicious model to force it to respond to any PII-related queries. The attacker then uploads this malicious model to the model merging conductor and obtains the merged model. Second, the attacker inputs direct PII-related…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Digital Rights Management and Security · Auction Theory and Applications
