ScreenSearch: Uncertainty-Aware OS Exploration
Michael Solodko, Justin Wagle

TL;DR
ScreenSearch is a system for large-scale desktop exploration that combines structural screen retrieval with an ambiguity-aware graph-bandit to effectively explore and reduce uncertainty in GUI states.
Contribution
The paper introduces ScreenSearch, a novel OS exploration system that integrates structural retrieval and ambiguity-aware exploration to improve desktop state discovery.
Findings
Collected over 1 million screenshots across 11 applications.
Discovered a trade-off between novelty and ambiguity reduction in exploration policies.
Ablation studies show proposal priors improve state discovery.
Abstract
Desktop GUI agents operate under partial observability: visually similar screens can correspond to different underlying workflow states, so locally plausible actions can lead to sharply different outcomes. We frame this as a problem of computer/OS state exploration, where effective behavior requires both expanding the reachable frontier and reducing ambiguity before committing. We present ScreenSearch, a system that combines structural screen retrieval and deduplication with an ambiguity-aware PUCT graph-bandit for large-scale desktop exploration. The retrieval layer converts UIA trees into location-aware structural features, indexes related screens through sparse token search and metadata filters, and maintains a shared deduplicated state graph across VM workers. On top of this graph, we define a scalable ambiguity signal based on matched-action outcome dispersion. If similar screens…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
