BadMerging: Backdoor Attacks Against Model Merging

Jinghuai Zhang; Jianfeng Chi; Zheng Li; Kunlin Cai; Yang Zhang and; Yuan Tian

arXiv:2408.07362·cs.CR·September 4, 2024

BadMerging: Backdoor Attacks Against Model Merging

Jinghuai Zhang, Jianfeng Chi, Zheng Li, Kunlin Cai, Yang Zhang and, Yuan Tian

PDF

Open Access 1 Repo

TL;DR

This paper introduces BadMerging, a novel backdoor attack targeting model merging techniques, demonstrating its effectiveness and exposing security vulnerabilities in current model merging practices.

Contribution

It presents the first backdoor attack specifically designed for model merging, with a two-stage mechanism and feature-interpolation loss to improve robustness against merging variations.

Findings

01

BadMerging successfully compromises merged models across various algorithms.

02

The attack remains effective despite different merging parameters and task domains.

03

Existing defenses fail to detect or mitigate BadMerging attacks.

Abstract

Fine-tuning pre-trained models for downstream tasks has led to a proliferation of open-sourced task-specific models. Recently, Model Merging (MM) has emerged as an effective approach to facilitate knowledge transfer among these independently fine-tuned models. MM directly combines multiple fine-tuned task-specific models into a merged model without additional training, and the resulting model shows enhanced capabilities in multiple tasks. Although MM provides great utility, it may come with security risks because an adversary can exploit MM to affect multiple downstream tasks. However, the security risks of MM have barely been studied. In this paper, we first find that MM, as a new learning paradigm, introduces unique challenges for existing backdoor attacks due to the merging process. To address these challenges, we introduce BadMerging, the first backdoor attack specifically designed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jzhang538/badmerging
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAccess Control and Trust · Security and Verification in Computing · Formal Methods in Verification