MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction
Wenqi Zhang, Yulin Shen, Changyue Jiang, Jiarun Dai, Geng Hong, Xudong Pan

TL;DR
MirrorGuard is a novel simulation-based framework that enhances the security of autonomous computer-use agents by predicting and correcting unsafe reasoning before real-world actions occur, significantly reducing security risks.
Contribution
The paper introduces a neural-symbolic simulation pipeline for training security defenses in GUIs, enabling real-time correction of unsafe reasoning in autonomous agents.
Findings
Reduces unsafe actions from 66.5% to 13.0% on ByteDance UI-TARS.
Outperforms state-of-the-art GuardAgent in safety and false refusal rate.
Demonstrates robustness across diverse benchmarks and architectures.
Abstract
Large foundation models are integrated into Computer Use Agents (CUAs), enabling autonomous interaction with operating systems through graphical user interfaces (GUIs) to perform complex tasks. This autonomy introduces serious security risks: malicious instructions or visual prompt injections can trigger unsafe reasoning and cause harmful system-level actions. Existing defenses, such as detection-based blocking, prevent damage but often abort tasks prematurely, reducing agent utility. In this paper, we present MirrorGuard, a plug-and-play defense framework that uses simulation-based training to improve CUA security in the real world. To reduce the cost of large-scale training in operating systems, we propose a novel neural-symbolic simulation pipeline, which generates realistic, high-risk GUI interaction trajectories entirely in a text-based simulated environment, which captures unsafe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Advanced Malware Detection Techniques
