UserBench: An Interactive Gym Environment for User-Centric Agents

Cheng Qian; Zuxin Liu; Akshara Prabhakar; Zhiwei Liu; Jianguo Zhang; Haolin Chen; Heng Ji; Weiran Yao; Shelby Heinecke; Silvio Savarese; Caiming Xiong; Huan Wang

arXiv:2507.22034·cs.AI·July 30, 2025

UserBench: An Interactive Gym Environment for User-Centric Agents

Cheng Qian, Zuxin Liu, Akshara Prabhakar, Zhiwei Liu, Jianguo Zhang, Haolin Chen, Heng Ji, Weiran Yao, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang

PDF

TL;DR

UserBench is a new interactive benchmark for evaluating user-centric agents, revealing current LLMs' limitations in proactive collaboration and preference understanding during multi-turn, goal-driven interactions.

Contribution

We introduce UserBench, a benchmark for assessing agents' ability to proactively clarify goals and preferences in user interactions, highlighting gaps in current LLM capabilities.

Findings

01

Models align with user intents only 20% of the time

02

Most models uncover less than 30% of user preferences

03

Current agents struggle with proactive, collaborative behavior

Abstract

Large Language Models (LLMs)-based agents have made impressive progress in reasoning and tool use, enabling them to solve complex tasks. However, their ability to proactively collaborate with users, especially when goals are vague, evolving, or indirectly expressed, remains underexplored. To address this gap, we introduce UserBench, a user-centric benchmark designed to evaluate agents in multi-turn, preference-driven interactions. UserBench features simulated users who start with underspecified goals and reveal preferences incrementally, requiring agents to proactively clarify intent and make grounded decisions with tools. Our evaluation of leading open- and closed-source LLMs reveals a significant disconnect between task completion and user alignment. For instance, models provide answers that fully align with all user intents only 20% of the time on average, and even the most advanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.