SafeGround: Know When to Trust GUI Grounding Models via Uncertainty Calibration
Qingni Wang, Yue Fan, Xin Eric Wang

TL;DR
SafeGround introduces an uncertainty calibration framework for GUI grounding models, enabling risk-aware predictions with controlled false discovery rates, leading to improved system accuracy and reliability.
Contribution
It presents a novel uncertainty-aware calibration method for GUI grounding models that enhances prediction reliability and risk management.
Findings
Outperforms existing baselines in distinguishing correct from incorrect predictions.
Calibrated thresholds enable rigorous risk control and accuracy improvements.
Improves system-level accuracy by up to 5.38 percentage points.
Abstract
Graphical User Interface (GUI) grounding aims to translate natural language instructions into executable screen coordinates, enabling automated GUI interaction. Nevertheless, incorrect grounding can result in costly, hard-to-reverse actions (e.g., erroneous payment approvals), raising concerns about model reliability. In this paper, we introduce SafeGround, an uncertainty-aware framework for GUI grounding models that enables risk-aware predictions through calibrations before testing. SafeGround leverages a distribution-aware uncertainty quantification method to capture the spatial dispersion of stochastic samples from outputs of any given model. Then, through the calibration process, SafeGround derives a test-time decision threshold with statistically guaranteed false discovery rate (FDR) control. We apply SafeGround on multiple GUI grounding models for the challenging ScreenSpot-Pro…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInteractive and Immersive Displays · Adversarial Robustness in Machine Learning · Security and Verification in Computing
