Agentic Reward Modeling: Verifying GUI Agent via Online Proactive Interaction
Chaoqun Cui, Jing Huang, Shijing Wang, Liming Zheng, Qingchao Kong, Zhixiong Zeng

TL;DR
This paper introduces VAGEN, an interactive framework for verifying GUI agents through proactive environment probing, overcoming passive evaluation limitations and improving accuracy in complex GUI task verification.
Contribution
We propose VAGEN, a novel agentic verification framework that uses autonomous interaction to verify GUI agents, addressing scalability and partial observability issues.
Findings
VAGEN outperforms LLM-as-a-Judge baselines in accuracy.
Proactive probing enhances verification reliability.
Test-time scaling further improves evaluation results.
Abstract
Reinforcement learning with verifiable rewards (RLVR) is pivotal for the continuous evolution of GUI agents, yet existing evaluation paradigms face significant limitations. Rule-based methods suffer from poor scalability and cannot handle open-ended tasks, while LLM-as-a-Judge approaches rely on passive visual observation, often failing to capture latent system states due to partial state observability. To address these challenges, we advocate for a paradigm shift from passive evaluation to Agentic Interactive Verification. We introduce VAGEN, a framework that employs a verifier agent equipped with interaction tools to autonomously plan verification strategies and proactively probe the environment for evidence of task completion. Leveraging the insight that GUI tasks are typically "easy to verify but hard to solve", VAGEN overcomes the bottlenecks of visual limitations. Experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Advanced Software Engineering Methodologies
