Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution

Guoxin Shi; Haoyu Wang; Zaihui Yang; Yuxing Wang; Yongzhe Chang

arXiv:2603.01784·cs.CR·March 3, 2026

Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution

Guoxin Shi, Haoyu Wang, Zaihui Yang, Yuxing Wang, Yongzhe Chang

PDF

Open Access

TL;DR

This paper introduces CEMMA, a co-evolutionary framework for multimodal safety alignment that dynamically adapts to evolving adversarial attacks, significantly improving robustness and generalization of large language models.

Contribution

It presents a novel adaptive framework with evolutionary attack and defense mechanisms for multimodal alignment, moving beyond static adversarial supervision.

Findings

01

Increased attack success rate against jailbreak prompts.

02

Enhanced robustness and generalization across benchmarks.

03

Maintains compatibility with inference-time defenses.

Abstract

Adversarial behavior plays a central role in aligning large language models with human values. However, existing alignment methods largely rely on static adversarial settings, which fundamentally limit robustness, particularly in multimodal settings with a larger attack surface. In this work, we move beyond static adversarial supervision and introduce co-evolutionary alignment with evolving attacks, instantiated by CEMMA (Co-Evolutionary Multi-Modal Alignment), an automated and adaptive framework for multimodal safety alignment. We introduce an Evolutionary Attacker that decomposes adversarial prompts into method templates and harmful intents. By employing genetic operators, including mutation, crossover, and differential evolution, it enables simple seed attacks to inherit the structural efficacy of sophisticated jailbreaks. The Adaptive Defender is iteratively updated on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Natural Language Processing Techniques