TL;DR
CORA is a framework for GUI agents that provides statistical safety guarantees by selectively executing actions and routing risky ones for human intervention, validated on a new mobile safety benchmark.
Contribution
It introduces a formal, risk-controlled safety mechanism for autonomous GUI agents, combining conformal risk control with multimodal reasoning and a new safety benchmark.
Findings
CORA improves safety-helpfulness-interruption trade-offs on Phone-Harm benchmark.
The framework effectively calibrates action execution to user-defined risk budgets.
Experiments show CORA outperforms diverse baselines in safety and utility.
Abstract
Graphical user interface (GUI) agents powered by vision language models (VLMs) are rapidly moving from passive assistance to autonomous operation. However, this unrestricted action space exposes users to severe and irreversible financial, privacy or social harm. Existing safeguards rely on prompt engineering, brittle heuristics and VLM-as-critic lack formal verification and user-tunable guarantees. We propose CORA (COnformal Risk-controlled GUI Agent), a post-policy, pre-action safeguarding framework that provides statistical guarantees on harmful executed actions. CORA reformulates safety as selective action execution: we train a Guardian model to estimate action-conditional risk for each proposed step. Rather than thresholding raw scores, we leverage Conformal Risk Control to calibrate an execute/abstain boundary that satisfies a user-specified risk budget and route rejected actions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
