Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings

Fatemeh Akbarian; Anahita Baninajjar; Yingyi Zhang; Ananth Balashankar; Amir Aminifar

arXiv:2511.21893·cs.LG·April 22, 2026

Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings

Fatemeh Akbarian, Anahita Baninajjar, Yingyi Zhang, Ananth Balashankar, Amir Aminifar

PDF

1 Repo

TL;DR

This paper introduces a generative, consensus-based mitigation method using VAEs to defend multi-modal models against adversarial illusions, significantly reducing attack success rates.

Contribution

It proposes a novel, task-agnostic defense mechanism employing generative sampling and consensus aggregation to counter adversarial illusions in multi-modal embeddings.

Findings

01

Reduces illusion attack success rates to near-zero.

02

Improves cross-modal alignment in both perturbed and unperturbed inputs.

03

Provides an effective, task-agnostic defense mechanism.

Abstract

Multi-modal foundation models align images, text, and other modalities in a shared embedding space but remain vulnerable to adversarial illusions [35], where imperceptible perturbations disrupt cross-modal alignment and mislead downstream tasks. To counteract the effects of adversarial illusions, we propose a task-agnostic mitigation mechanism that purifies the attacker's perturbed input using generative models, e.g., Variational Autoencoders (VAEs), to restore natural alignment. To further enhance the defense mechanism, we adopt a generative sampling strategy combined with a consensus-based aggregation scheme over the outcomes of the generated samples. Our experiments on ImageBind, a state-of-the-art multi-modal encoder, show that our approach substantially reduces the illusion attack success rates to near-zero and improves cross-modal alignment in unperturbed and perturbed input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fatemehakb/adversarial-illusions-mitigation
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.