TL;DR
BadCM introduces a novel invisible backdoor attack framework targeting cross-modal learning, effectively manipulating models across different modalities while evading defenses, with broad applicability and high stealthiness.
Contribution
This paper presents the first unified invisible backdoor framework for diverse cross-modal attacks, utilizing modality-invariant regions and specialized generators for high stealthiness.
Findings
Effective in cross-modal retrieval and VQA tasks
Can evade existing backdoor defenses
Demonstrates strong generalization across attack scenarios
Abstract
Despite remarkable successes in unimodal learning tasks, backdoor attacks against cross-modal learning are still underexplored due to the limited generalization and inferior stealthiness when involving multiple modalities. Notably, since works in this area mainly inherit ideas from unimodal visual attacks, they struggle with dealing with diverse cross-modal attack circumstances and manipulating imperceptible trigger samples, which hinders their practicability in real-world applications. In this paper, we introduce a novel bilateral backdoor to fill in the missing pieces of the puzzle in the cross-modal backdoor and propose a generalized invisible backdoor framework against cross-modal learning (BadCM). Specifically, a cross-modal mining scheme is developed to capture the modality-invariant components as target poisoning areas, where well-designed trigger patterns injected into these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
