ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents

Navid Madani; Rohini Srihari

arXiv:2505.12531·cs.CL·May 20, 2025

ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents

Navid Madani, Rohini Srihari

PDF

Open Access 1 Video

TL;DR

ESC-Judge is a scalable, automated framework grounded in counseling theory that evaluates emotional-support chatbots by simulating realistic scenarios and comparing model responses with human-level reliability.

Contribution

It introduces the first end-to-end, theory-grounded, automated evaluation framework for emotional-support LLMs, enabling scalable and interpretable comparisons.

Findings

01

Matched human annotator decisions at over 80% accuracy

02

Automated evaluation reduces cost and time compared to human annotation

03

Provides transparent, theory-based assessment of emotional support quality

Abstract

Large language models (LLMs) increasingly power mental-health chatbots, yet the field still lacks a scalable, theory-grounded way to decide which model is most effective to deploy. We present ESC-Judge, the first end-to-end evaluation framework that (i) grounds head-to-head comparisons of emotional-support LLMs in Clara Hill's established Exploration-Insight-Action counseling model, providing a structured and interpretable view of performance, and (ii) fully automates the evaluation pipeline at scale. ESC-Judge operates in three stages: first, it synthesizes realistic help-seeker roles by sampling empirically salient attributes such as stressors, personality, and life history; second, it has two candidate support agents conduct separate sessions with the same role, isolating model-specific strategies; and third, it asks a specialized judge LLM to express pairwise preferences across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents· underline

Taxonomy

TopicsDigital Mental Health Interventions · Mental Health via Writing · Artificial Intelligence in Healthcare and Education