TL;DR
BAMI introduces a training-free bias mitigation method for GUI grounding tasks, significantly improving model accuracy by addressing image resolution and interface complexity biases.
Contribution
The paper proposes BAMI, a novel bias mitigation technique that enhances GUI grounding accuracy without additional training, using coarse-to-fine focus and candidate selection strategies.
Findings
BAMI improves accuracy of GUI grounding models on ScreenSpot-Pro benchmark.
Applying BAMI to TianXi-Action-7B increases accuracy from 51.9% to 57.8%.
Ablation studies show BAMI's robustness across different parameters.
Abstract
GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Prediction Distribution (MPD)} attribution method, we identify that the primary sources of errors are twofold: high image resolution (leading to precision bias) and intricate interface elements (resulting in ambiguity bias). To address these challenges, we introduce \textbf{Bias-Aware Manipulation Inference (BAMI)}, which incorporates two key manipulations, coarse-to-fine focus and candidate selection, to effectively mitigate these biases. Our extensive experimental results demonstrate that BAMI significantly enhances the accuracy of various GUI grounding models in a training-free setting. For instance, applying our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
