H2HTalk: Evaluating Large Language Models as Emotional Companion
Boyang Wang, Yalun Wu, Hongcheng Guo, Zhoujun Li

TL;DR
H2HTalk is a comprehensive benchmark designed to evaluate large language models as emotional companions, focusing on empathy, personality development, and safety in support conversations, with extensive scenarios and a new attachment-based safety module.
Contribution
This paper introduces H2HTalk, the first large-scale benchmark for assessing LLMs as emotional companions, including a novel attachment-theory-based safety module and diverse real-world scenarios.
Findings
Models struggle with implicit user needs and evolving conversations.
Long-horizon planning and memory retention are key challenges.
H2HTalk provides a new standard for evaluating emotionally intelligent LLMs.
Abstract
As digital emotional support needs grow, Large Language Model companions offer promising authentic, always-available empathy, though rigorous evaluation lags behind model advancement. We present Heart-to-Heart Talk (H2HTalk), a benchmark assessing companions across personality development and empathetic interaction, balancing emotional intelligence with linguistic fluency. H2HTalk features 4,650 curated scenarios spanning dialogue, recollection, and itinerary planning that mirror real-world support conversations, substantially exceeding previous datasets in scale and diversity. We incorporate a Secure Attachment Persona (SAP) module implementing attachment-theory principles for safer interactions. Benchmarking 50 LLMs with our unified protocol reveals that long-horizon planning and memory retention remain key challenges, with models struggling when user needs are implicit or evolve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
