ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models
Haiquan Zhao, Lingyu Li, Shisong Chen, Shuqi Kong, Jiaan, Wang, Kexin Huang, Tianle Gu, Yixu Wang, Wang Jian, Dandan, Liang, Zhixu Li, Yan Teng, Yanghua Xiao, Yingchun Wang

TL;DR
This paper introduces ESC-Eval, a comprehensive framework for evaluating emotion support conversations in large language models using role-playing agents and human annotations, highlighting the performance gap between LLMs and humans.
Contribution
The paper proposes a novel evaluation framework with a role-playing agent and introduces ESC-RANK for automated scoring of ESC models, advancing assessment methods in this domain.
Findings
ESC-oriented LLMs outperform general AI-assistant LLMs in ESC tasks.
There remains a performance gap between LLMs and humans in ESC.
ESC-RANK achieves over 35 points of GPT-4 in automated scoring.
Abstract
Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. Inspired by the awesome development of role-playing agents, we propose an ESC Evaluation framework (ESC-Eval), which uses a role-playing agent to interact with ESC models, followed by a manual evaluation of the interactive dialogues. In detail, we first re-organize 2,801 role-playing cards from seven existing datasets to define the roles of the role-playing agent. Second, we train a specific role-playing model called ESC-Role which behaves more like a confused person than GPT-4. Third, through ESC-Role and organized role cards, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMental Health via Writing
MethodsAttention Is All You Need · Softmax · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer
