GEM: Gaussian Embedding Modeling for Out-of-Distribution Detection in GUI Agents
Zheng Wu, Pengzhou Cheng, Zongru Wu, Lingzhong Dong, Zhuosheng Zhang

TL;DR
GEM is a Gaussian mixture model-based method that improves out-of-distribution detection in GUI agents by analyzing input embedding distances, leading to higher accuracy and better task success rates across diverse environments.
Contribution
This work introduces GEM, a novel Gaussian mixture model approach that effectively detects OOD inputs in GUI agents by leveraging clustering patterns in embedding space.
Findings
Achieves 23.70% accuracy improvement over baselines
Increases step-wise success rate by 9.40% with cloud assistance
Demonstrates strong generalization across nine backbone models
Abstract
Graphical user interface (GUI) agents have recently emerged as an intriguing paradigm for human-computer interaction, capable of automatically executing user instructions to operate intelligent terminal devices. However, when encountering out-of-distribution (OOD) instructions that violate environmental constraints or exceed the current capabilities of agents, GUI agents may suffer task breakdowns or even pose security threats. Therefore, effective OOD detection for GUI agents is essential. Traditional OOD detection methods perform suboptimally in this domain due to the complex embedding space and evolving GUI environments. In this work, we observe that the in-distribution input semantic space of GUI agents exhibits a clustering pattern with respect to the distance from the centroid. Based on the finding, we propose GEM, a novel method based on fitting a Gaussian mixture model over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
