TL;DR
This paper introduces a generative, consensus-based mitigation method using VAEs to defend multi-modal models against adversarial illusions, significantly reducing attack success rates.
Contribution
It proposes a novel, task-agnostic defense mechanism employing generative sampling and consensus aggregation to counter adversarial illusions in multi-modal embeddings.
Findings
Reduces illusion attack success rates to near-zero.
Improves cross-modal alignment in both perturbed and unperturbed inputs.
Provides an effective, task-agnostic defense mechanism.
Abstract
Multi-modal foundation models align images, text, and other modalities in a shared embedding space but remain vulnerable to adversarial illusions [35], where imperceptible perturbations disrupt cross-modal alignment and mislead downstream tasks. To counteract the effects of adversarial illusions, we propose a task-agnostic mitigation mechanism that purifies the attacker's perturbed input using generative models, e.g., Variational Autoencoders (VAEs), to restore natural alignment. To further enhance the defense mechanism, we adopt a generative sampling strategy combined with a consensus-based aggregation scheme over the outcomes of the generated samples. Our experiments on ImageBind, a state-of-the-art multi-modal encoder, show that our approach substantially reduces the illusion attack success rates to near-zero and improves cross-modal alignment in unperturbed and perturbed input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
