TEA-Bench: A Systematic Benchmarking of Tool-enhanced Emotional Support Dialogue Agent

Xingyu Sui; Yanyan Zhao; Yulin Hu; Jiahe Guo; Weixiang Zhao; Bing Qin

arXiv:2601.18700·cs.AI·May 11, 2026

TEA-Bench: A Systematic Benchmarking of Tool-enhanced Emotional Support Dialogue Agent

Xingyu Sui, Yanyan Zhao, Yulin Hu, Jiahe Guo, Weixiang Zhao, Bing Qin

PDF

1 Repo 1 Datasets

TL;DR

TEA-Bench is a new benchmark for evaluating tool-augmented emotional support dialogue agents, emphasizing factual grounding and reduced hallucination in multi-turn conversations.

Contribution

It introduces TEA-Bench, the first interactive benchmark with process-level metrics and a dataset, highlighting the impact of tool use on emotional support quality.

Findings

01

Tool augmentation improves support quality and reduces hallucination.

02

Model capacity influences effectiveness of tool use.

03

Supervised fine-tuning enhances in-distribution support but poorly generalizes.

Abstract

Emotional Support Conversation requires not only affective expression but also grounded instrumental support to provide trustworthy guidance. However, existing ESC systems and benchmarks largely focus on affective support in text-only settings, overlooking how external tools can enable factual grounding and reduce hallucination in multi-turn emotional support. We introduce TEA-Bench, the first interactive benchmark for evaluating tool-augmented agents in ESC, featuring realistic emotional scenarios, an MCP-style tool environment, and process-level metrics that jointly assess the quality and factual grounding of emotional support. Experiments on nine LLMs show that tool augmentation generally improves emotional support quality and reduces hallucination, but the gains are strongly capacity-dependent: stronger models use tools more selectively and effectively, while weaker models benefit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

XingYuSSS/TEA-Bench
github

Datasets

XingYuSSS/TEA-Dialog
dataset· 44 dl
44 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.