Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

Haoyu Liu; Dingcheng Li; Lukas Rutishauser; Zeyu Zheng

arXiv:2603.04364·cs.LG·March 5, 2026

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

Haoyu Liu, Dingcheng Li, Lukas Rutishauser, Zeyu Zheng

PDF

Open Access

TL;DR

This paper introduces DMAST, a novel training framework that enhances multimodal web agents' robustness against cross-modal adversarial attacks by co-training through imitation, supervised fine-tuning, and adversarial reinforcement learning.

Contribution

The paper proposes DMAST, a multi-stage adversarial training method that formalizes agent-attacker interactions as a zero-sum game, improving robustness and efficiency of multimodal web agents.

Findings

01

DMAST significantly reduces adversarial vulnerabilities.

02

DMAST doubles task completion efficiency.

03

Outperforms existing defenses in robustness and generalization.

Abstract

Multimodal web agents that process both screenshots and accessibility trees are increasingly deployed to interact with web interfaces, yet their dual-stream architecture opens an underexplored attack surface: an adversary who injects content into the webpage DOM simultaneously corrupts both observation channels with a consistent deceptive narrative. Our vulnerability analysis on MiniWob++ reveals that attacks including a visual component far outperform text-only injections, exposing critical gaps in text-centric VLM safety training. Motivated by this finding, we propose Dual-Modality Multi-Stage Adversarial Safety Training (DMAST), a framework that formalizes the agent-attacker interaction as a two-player zero-sum Markov game and co-trains both players through a three-stage pipeline: (1) imitation learning from a strong teacher model, (2) oracle-guided supervised fine-tuning that uses a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Malware Detection Techniques