AURA: A Diagnostic Framework for Tracking User Satisfaction of Interactive Planning Agents
Takyoung Kim, Janvijay Singh, Shuhaib Mehri, Emre Can Acikgoz, Sagnik Mukherjee, Nimet Beyza Bozdag, Sumuk Shashidhar, Gokhan Tur, Dilek Hakkani-T\"ur

TL;DR
AURA is a diagnostic framework that evaluates user satisfaction with interactive planning agents by analyzing their decision-making process, beyond just task success, to improve agent design and user experience.
Contribution
We introduce AURA, a comprehensive assessment framework that diagnoses strengths and weaknesses in agent decision-making stages to better align with user satisfaction.
Findings
Agents vary in performance across behavioral stages
User satisfaction depends on both outcomes and intermediate behaviors
Limitations identified in current user simulators for task planning
Abstract
The growing capabilities of large language models (LLMs) in instruction-following and context-understanding lead to the era of agents with numerous applications. Among these, task planning agents have become especially prominent in realistic scenarios involving complex internal pipelines, such as context understanding, tool management, and response generation. However, existing benchmarks predominantly evaluate agent performance based on task completion as a proxy for overall effectiveness. We hypothesize that merely improving task completion is misaligned with maximizing user satisfaction, as users interact with the entire agentic process and not only the end result. To address this gap, we propose AURA, an Agent-User inteRaction Assessment framework that conceptualizes the behavioral stages of interactive task planning agents. AURA offers a comprehensive assessment of agent through a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Artificial Intelligence in Games · Software Engineering Research
MethodsSparse Evolutionary Training
