SafePred: A Predictive Guardrail for Computer-Using Agents via World Models

Yurun Chen; Zeyi Liao; Ping Yin; Taotao Xie; Keting Yin; Shengyu Zhang

arXiv:2602.01725·cs.CL·February 3, 2026

SafePred: A Predictive Guardrail for Computer-Using Agents via World Models

Yurun Chen, Zeyi Liao, Ping Yin, Taotao Xie, Keting Yin, Shengyu Zhang

PDF

Open Access

TL;DR

SafePred introduces a predictive guardrail framework for computer-using agents that proactively predicts and mitigates long-term risks, significantly enhancing safety and utility over reactive methods.

Contribution

The paper presents SafePred, a novel framework that uses world models to predict future risks and guide safe decision-making in CUAs, addressing limitations of reactive guardrails.

Findings

01

Achieves over 97.6% safety performance in experiments.

02

Improves task utility by up to 21.4%.

03

Effectively reduces high-risk behaviors.

Abstract

With the widespread deployment of Computer-using Agents (CUAs) in complex real-world environments, prevalent long-term risks often lead to severe and irreversible consequences. Most existing guardrails for CUAs adopt a reactive approach, constraining agent behavior only within the current observation space. While these guardrails can prevent immediate short-term risks (e.g., clicking on a phishing link), they cannot proactively avoid long-term risks: seemingly reasonable actions can lead to high-risk consequences that emerge with a delay (e.g., cleaning logs leads to future audits being untraceable), which reactive guardrails cannot identify within the current observation space. To address these limitations, we propose a predictive guardrail approach, with the core idea of aligning predicted future risks with current decisions. Based on this approach, we present SafePred, a predictive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Adversarial Robustness in Machine Learning · Safety Systems Engineering in Autonomy