AURA: A Diagnostic Framework for Tracking User Satisfaction of Interactive Planning Agents

Takyoung Kim; Janvijay Singh; Shuhaib Mehri; Emre Can Acikgoz; Sagnik Mukherjee; Nimet Beyza Bozdag; Sumuk Shashidhar; Gokhan Tur; Dilek Hakkani-T\"ur

arXiv:2505.01592·cs.CL·December 8, 2025

AURA: A Diagnostic Framework for Tracking User Satisfaction of Interactive Planning Agents

Takyoung Kim, Janvijay Singh, Shuhaib Mehri, Emre Can Acikgoz, Sagnik Mukherjee, Nimet Beyza Bozdag, Sumuk Shashidhar, Gokhan Tur, Dilek Hakkani-T\"ur

PDF

Open Access

TL;DR

AURA is a diagnostic framework that evaluates user satisfaction with interactive planning agents by analyzing their decision-making process, beyond just task success, to improve agent design and user experience.

Contribution

We introduce AURA, a comprehensive assessment framework that diagnoses strengths and weaknesses in agent decision-making stages to better align with user satisfaction.

Findings

01

Agents vary in performance across behavioral stages

02

User satisfaction depends on both outcomes and intermediate behaviors

03

Limitations identified in current user simulators for task planning

Abstract

The growing capabilities of large language models (LLMs) in instruction-following and context-understanding lead to the era of agents with numerous applications. Among these, task planning agents have become especially prominent in realistic scenarios involving complex internal pipelines, such as context understanding, tool management, and response generation. However, existing benchmarks predominantly evaluate agent performance based on task completion as a proxy for overall effectiveness. We hypothesize that merely improving task completion is misaligned with maximizing user satisfaction, as users interact with the entire agentic process and not only the end result. To address this gap, we propose AURA, an Agent-User inteRaction Assessment framework that conceptualizes the behavioral stages of interactive task planning agents. AURA offers a comprehensive assessment of agent through a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Artificial Intelligence in Games · Software Engineering Research

MethodsSparse Evolutionary Training