Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model   Merging

Lin Lu; Zhigang Zuo; Ziji Sheng; Pan Zhou

arXiv:2502.16094·cs.CR·February 25, 2025

Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model Merging

Lin Lu, Zhigang Zuo, Ziji Sheng, Pan Zhou

PDF

Open Access

TL;DR

This paper reveals a novel attack method called Merger-as-a-Stealer that exploits model merging to extract targeted personally identifiable information from aligned large language models, highlighting security vulnerabilities.

Contribution

It introduces a two-stage attack framework demonstrating how malicious model merging can be used to steal targeted PII from large language models.

Findings

01

Successful extraction of targeted PII across various models

02

Effective attack against multiple model merging methods

03

Highlights need for improved model security and defenses

Abstract

Model merging has emerged as a promising approach for updating large language models (LLMs) by integrating multiple domain-specific models into a cross-domain merged model. Despite its utility and plug-and-play nature, unmonitored mergers can introduce significant security vulnerabilities, such as backdoor attacks and model merging abuse. In this paper, we identify a novel and more realistic attack surface where a malicious merger can extract targeted personally identifiable information (PII) from an aligned model with model merging. Specifically, we propose \texttt{Merger-as-a-Stealer}, a two-stage framework to achieve this attack: First, the attacker fine-tunes a malicious model to force it to respond to any PII-related queries. The attacker then uploads this malicious model to the model merging conductor and obtains the merged model. Second, the attacker inputs direct PII-related…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Digital Rights Management and Security · Auction Theory and Applications