Dynamic Adversarial Reinforcement Learning for Robust Multimodal Large Language Models

Yicheng Bao; Xuhong Wang; Qiaosheng Zhang; Chaochao Lu; Xia Hu; Xin Tan

arXiv:2602.22227·cs.LG·March 5, 2026

Dynamic Adversarial Reinforcement Learning for Robust Multimodal Large Language Models

Yicheng Bao, Xuhong Wang, Qiaosheng Zhang, Chaochao Lu, Xia Hu, Xin Tan

PDF

Open Access

TL;DR

This paper presents AOT, a self-play adversarial training framework that significantly improves the robustness and reliability of Multimodal Large Language Models against complex visual scenes.

Contribution

It introduces AOT-SFT, a large-scale adversarial dataset, and a self-play training method that co-evolves an attacker and defender to enhance MLLM robustness.

Findings

01

AOT improves perceptual robustness of MLLMs.

02

Reduces hallucinations in multimodal models.

03

Establishes scalable adversarial training paradigm.

Abstract

Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) exhibit perceptual fragility when confronted with visually complex scenes. This weakness stems from a reliance on finite training datasets, which are prohibitively expensive to scale and impose a ceiling on model robustness. We introduce \textbf{AOT-SFT}, a large-scale adversarial dataset for bootstrapping MLLM robustness. Building on this, we propose \textbf{AOT (Adversarial Opponent Training)}, a self-play framework that forges MLLM robustness by creating its own training data. Our method orchestrates a co-evolution between an image-editing Attacker and a Defender MLLM, where the Attacker generates a diverse and dynamic curriculum of image manipulations, forcing the Defender to adapt and improve. Extensive experiments demonstrate that AOT enhances the Defender's perceptual robustness and reduces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)