HeartBench: Probing Core Dimensions of Anthropomorphic Intelligence in LLMs

Jiaxin Liu; Peiyi Tu; Wenyu Chen; Yihong Zhuang; Xinxia Ling; Anji Zhou; Chenxi Wang; Zhuo Han; Zhengkai Yang; Junbo Zhao; Zenan Huang; Yuanyuan Wang

arXiv:2512.21849·cs.CL·December 29, 2025

HeartBench: Probing Core Dimensions of Anthropomorphic Intelligence in LLMs

Jiaxin Liu, Peiyi Tu, Wenyu Chen, Yihong Zhuang, Xinxia Ling, Anji Zhou, Chenxi Wang, Zhuo Han, Zhengkai Yang, Junbo Zhao, Zenan Huang, Yuanyuan Wang

PDF

Open Access

TL;DR

HeartBench is a new evaluation framework for Chinese LLMs that measures their ability to handle social, emotional, and ethical nuances, revealing significant performance gaps in complex scenarios.

Contribution

This paper introduces HeartBench, a theory-driven, rubric-based benchmark for assessing anthropomorphic intelligence in Chinese LLMs, addressing a critical evaluation gap.

Findings

01

Leading models score only 60% of ideal in anthropomorphic tasks.

02

Performance drops significantly in subtle emotional and complex ethical scenarios.

03

HeartBench provides a standardized metric and methodology for human-aligned AI evaluation.

Abstract

While Large Language Models (LLMs) have achieved remarkable success in cognitive and reasoning benchmarks, they exhibit a persistent deficit in anthropomorphic intelligence-the capacity to navigate complex social, emotional, and ethical nuances. This gap is particularly acute in the Chinese linguistic and cultural context, where a lack of specialized evaluation frameworks and high-quality socio-emotional data impedes progress. To address these limitations, we present HeartBench, a framework designed to evaluate the integrated emotional, cultural, and ethical dimensions of Chinese LLMs. Grounded in authentic psychological counseling scenarios and developed in collaboration with clinical experts, the benchmark is structured around a theory-driven taxonomy comprising five primary dimensions and 15 secondary capabilities. We implement a case-specific, rubric-based methodology that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Artificial Intelligence in Healthcare and Education · Cognitive Abilities and Testing