FineState-Bench: Benchmarking State-Conditioned Grounding for Fine-grained GUI State Setting

Fengxian Ji; Jingpu Yang; Zirui Song; Yuanxi Wang; Zhexuan Cui; Yuke Li; Qian Jiang; Xiuying Chen

arXiv:2604.27974·cs.CV·May 1, 2026

FineState-Bench: Benchmarking State-Conditioned Grounding for Fine-grained GUI State Setting

Fengxian Ji, Jingpu Yang, Zirui Song, Yuanxi Wang, Zhexuan Cui, Yuke Li, Qian Jiang, Xiuying Chen

PDF

1 Repo

TL;DR

FineState-Bench is a comprehensive benchmark for evaluating fine-grained, state-conditioned GUI interactions across multiple platforms, highlighting significant challenges and room for improvement in visual grounding accuracy.

Contribution

The paper introduces a new benchmark, diagnostic pipeline, and visual assistant to evaluate and analyze fine-grained GUI state-setting tasks in vision-language models.

Findings

01

Exact goal-state success rates are low, with a maximum of 32.8% on Web.

02

VDA localization hints improve success rates by approximately 15 points.

03

Current models still struggle with reliable fine-grained state-conditioned interactions.

Abstract

Despite the rapid progress of large vision-language models (LVLMs), fine-grained, state-conditioned GUI interaction remains challenging. Current evaluations offer limited coverage, imprecise target-state definitions, and an overreliance on final-task success, obscuring where and why agents fail. To address this gap, we introduce \textbf{FineState-Bench}, a benchmark that evaluates whether an agent can correctly ground an instruction to the intended UI control and reach the exact target state. FineState-Bench comprises 2,209 instances across desktop, web, and mobile platforms, spanning four interaction families and 23 UI component types, with each instance explicitly specifying an exact target state for fine-grained state setting. We further propose \textit{FineState-Metrics}, a four-stage diagnostic pipeline with stage-wise success rates: Localization Success Rate (SR@Loc), Interaction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FengxianJi/FineState-Bench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.