ESC-Eval: Evaluating Emotion Support Conversations in Large Language   Models

Haiquan Zhao; Lingyu Li; Shisong Chen; Shuqi Kong; Jiaan; Wang; Kexin Huang; Tianle Gu; Yixu Wang; Wang Jian; Dandan; Liang; Zhixu Li; Yan Teng; Yanghua Xiao; Yingchun Wang

arXiv:2406.14952·cs.CL·October 29, 2024·1 cites

ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models

Haiquan Zhao, Lingyu Li, Shisong Chen, Shuqi Kong, Jiaan, Wang, Kexin Huang, Tianle Gu, Yixu Wang, Wang Jian, Dandan, Liang, Zhixu Li, Yan Teng, Yanghua Xiao, Yingchun Wang

PDF

Open Access 3 Repos 1 Video

TL;DR

This paper introduces ESC-Eval, a comprehensive framework for evaluating emotion support conversations in large language models using role-playing agents and human annotations, highlighting the performance gap between LLMs and humans.

Contribution

The paper proposes a novel evaluation framework with a role-playing agent and introduces ESC-RANK for automated scoring of ESC models, advancing assessment methods in this domain.

Findings

01

ESC-oriented LLMs outperform general AI-assistant LLMs in ESC tasks.

02

There remains a performance gap between LLMs and humans in ESC.

03

ESC-RANK achieves over 35 points of GPT-4 in automated scoring.

Abstract

Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. Inspired by the awesome development of role-playing agents, we propose an ESC Evaluation framework (ESC-Eval), which uses a role-playing agent to interact with ESC models, followed by a manual evaluation of the interactive dialogues. In detail, we first re-organize 2,801 role-playing cards from seven existing datasets to define the roles of the role-playing agent. Second, we train a specific role-playing model called ESC-Role which behaves more like a confused person than GPT-4. Third, through ESC-Role and organized role cards, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models· underline

Taxonomy

TopicsMental Health via Writing

MethodsAttention Is All You Need · Softmax · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer