InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning

Mengyuan Sun; Yu Li; Yuchen Liu; Bo Du; Yunjie Ge

arXiv:2506.12411·cs.CR·June 17, 2025

InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning

Mengyuan Sun, Yu Li, Yuchen Liu, Bo Du, Yunjie Ge

PDF

Open Access

TL;DR

InverTune is a novel defense framework that effectively removes backdoors from multimodal contrastive models like CLIP without prior attack knowledge, using trigger inversion and activation tuning techniques.

Contribution

It introduces a minimal assumption backdoor defense method for multimodal models, utilizing adversarial simulation, gradient inversion, and clustering-guided fine-tuning.

Findings

01

Reduces attack success rate by 97.87%

02

Limits clean accuracy degradation to 3.07%

03

Works without prior knowledge or poisoned data

Abstract

Multimodal contrastive learning models like CLIP have demonstrated remarkable vision-language alignment capabilities, yet their vulnerability to backdoor attacks poses critical security risks. Attackers can implant latent triggers that persist through downstream tasks, enabling malicious control of model behavior upon trigger presentation. Despite great success in recent defense mechanisms, they remain impractical due to strong assumptions about attacker knowledge or excessive clean data requirements. In this paper, we introduce InverTune, the first backdoor defense framework for multimodal models under minimal attacker assumptions, requiring neither prior knowledge of attack targets nor access to the poisoned dataset. Unlike existing defense methods that rely on the same dataset used in the poisoning stage, InverTune effectively identifies and removes backdoor artifacts through three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsContrastive Learning · Contrastive Language-Image Pre-training