MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction

Wenqi Zhang; Yulin Shen; Changyue Jiang; Jiarun Dai; Geng Hong; Xudong Pan

arXiv:2601.12822·cs.AI·January 21, 2026

MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction

Wenqi Zhang, Yulin Shen, Changyue Jiang, Jiarun Dai, Geng Hong, Xudong Pan

PDF

Open Access 1 Models

TL;DR

MirrorGuard is a novel simulation-based framework that enhances the security of autonomous computer-use agents by predicting and correcting unsafe reasoning before real-world actions occur, significantly reducing security risks.

Contribution

The paper introduces a neural-symbolic simulation pipeline for training security defenses in GUIs, enabling real-time correction of unsafe reasoning in autonomous agents.

Findings

01

Reduces unsafe actions from 66.5% to 13.0% on ByteDance UI-TARS.

02

Outperforms state-of-the-art GuardAgent in safety and false refusal rate.

03

Demonstrates robustness across diverse benchmarks and architectures.

Abstract

Large foundation models are integrated into Computer Use Agents (CUAs), enabling autonomous interaction with operating systems through graphical user interfaces (GUIs) to perform complex tasks. This autonomy introduces serious security risks: malicious instructions or visual prompt injections can trigger unsafe reasoning and cause harmful system-level actions. Existing defenses, such as detection-based blocking, prevent damage but often abort tasks prematurely, reducing agent utility. In this paper, we present MirrorGuard, a plug-and-play defense framework that uses simulation-based training to improve CUA security in the real world. To reduce the cost of large-scale training in operating systems, we propose a novel neural-symbolic simulation pipeline, which generates realistic, high-risk GUI interaction trajectories entirely in a text-based simulated environment, which captures unsafe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
WhitzardAgent/MirrorGuard
model· 69 dl
69 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Advanced Malware Detection Techniques