CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
Kung-Hsiang Huang, Akshara Prabhakar, Sidharth Dhawan, Yixin Mao, Huan, Wang, Silvio Savarese, Caiming Xiong, Philippe Laban, Chien-Sheng Wu

TL;DR
CRMArena is a new benchmark for evaluating AI agents on realistic CRM tasks, revealing current LLM limitations and guiding future improvements for enterprise applications.
Contribution
The paper introduces CRMArena, a comprehensive benchmark based on real-world CRM tasks and data, to evaluate AI agent performance in professional environments.
Findings
State-of-the-art LLM agents succeed in less than 40-55% of tasks.
Current agents struggle with function-calling and rule-following.
CRMArena provides a realistic and challenging testbed for AI in CRM.
Abstract
Customer Relationship Management (CRM) systems are vital for modern enterprises, providing a foundation for managing customer interactions and data. Integrating AI agents into CRM systems can automate routine processes and enhance personalized service. However, deploying and evaluating these agents is challenging due to the lack of realistic benchmarks that reflect the complexity of real-world CRM tasks. To address this issue, we introduce CRMArena, a novel benchmark designed to evaluate AI agents on realistic tasks grounded in professional work environments. Following guidance from CRM experts and industry best practices, we designed CRMArena with nine customer service tasks distributed across three personas: service agent, analyst, and manager. The benchmark includes 16 commonly used industrial objects (e.g., account, order, knowledge article, case) with high interconnectivity, along…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBusiness Process Modeling and Analysis · Multi-Agent Systems and Negotiation
Methodstravel james
