TEA-Bench: A Systematic Benchmarking of Tool-enhanced Emotional Support Dialogue Agent
Xingyu Sui, Yanyan Zhao, Yulin Hu, Jiahe Guo, Weixiang Zhao, Bing Qin

TL;DR
TEA-Bench is a new benchmark for evaluating tool-augmented emotional support dialogue agents, emphasizing factual grounding and reduced hallucination in multi-turn conversations.
Contribution
It introduces TEA-Bench, the first interactive benchmark with process-level metrics and a dataset, highlighting the impact of tool use on emotional support quality.
Findings
Tool augmentation improves support quality and reduces hallucination.
Model capacity influences effectiveness of tool use.
Supervised fine-tuning enhances in-distribution support but poorly generalizes.
Abstract
Emotional Support Conversation requires not only affective expression but also grounded instrumental support to provide trustworthy guidance. However, existing ESC systems and benchmarks largely focus on affective support in text-only settings, overlooking how external tools can enable factual grounding and reduce hallucination in multi-turn emotional support. We introduce TEA-Bench, the first interactive benchmark for evaluating tool-augmented agents in ESC, featuring realistic emotional scenarios, an MCP-style tool environment, and process-level metrics that jointly assess the quality and factual grounding of emotional support. Experiments on nine LLMs show that tool augmentation generally improves emotional support quality and reduces hallucination, but the gains are strongly capacity-dependent: stronger models use tools more selectively and effectively, while weaker models benefit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
