Temporal UI State Inconsistency in Desktop GUI Agents: Formalizing and Defending Against TOCTOU Attacks on Computer-Use Agents

Wenpeng Xu

arXiv:2604.18860·cs.CR·April 22, 2026

Temporal UI State Inconsistency in Desktop GUI Agents: Formalizing and Defending Against TOCTOU Attacks on Computer-Use Agents

Wenpeng Xu

PDF

TL;DR

This paper formalizes UI state inconsistency vulnerabilities in desktop GUI agents, demonstrating attacks and proposing a layered verification defense that effectively detects manipulation with minimal overhead.

Contribution

It introduces a formal model of visual atomicity violations and a layered UI verification method, significantly improving detection of TOCTOU attacks in desktop GUIs.

Findings

01

PUSV achieves 100% action interception rate in adversarial trials.

02

Primitive B attack achieves 100% success with zero visual evidence.

03

Different attack primitives require different detection signals, validating layered defense.

Abstract

GUI agents that control desktop computers via screenshot-and-click loops introduce a new class of vulnerability: the observation-to-action gap (mean 6.51 s on real OSWorld workloads) creates a Time-Of-Check, Time-Of-Use (TOCTOU) window during which an unprivileged attacker can manipulate the UI state. We formalize this as a Visual Atomicity Violation and characterize three concrete attack primitives: (A) Notification Overlay Hijack, (B) Window Focus Manipulation, and (C) Web DOM Injection. Primitive B, the closest desktop analog to Android Action Rebinding, achieves 100% action-redirection success rate with zero visual evidence at the observation time. We propose Pre-execution UI State Verification (PUSV), a lightweight three-layer defense that re-verifies the UI state immediately before each action dispatch: masked pixel SSIM at the click target (L1), global screenshot diff (L2a), and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.