GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy   for Large Language Models

Leyan Wang; Yonggang Jin; Tianhao Shen; Tianyu Zheng; Xinrun Du,; Chenchen Zhang; Wenhao Huang; Jiaheng Liu; Shi Wang; Ge Zhang; Liuyu Xiang,; Zhaofeng He

arXiv:2406.14903·cs.AI·June 25, 2024

GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models

Leyan Wang, Yonggang Jin, Tianhao Shen, Tianyu Zheng, Xinrun Du,, Chenchen Zhang, Wenhao Huang, Jiaheng Liu, Shi Wang, Ge Zhang, Liuyu Xiang,, Zhaofeng He

PDF

Open Access 1 Repo

TL;DR

GIEBench is a comprehensive benchmark designed to evaluate large language models' ability to demonstrate empathy towards diverse group identities, addressing a gap in existing empathy assessments by including identity-specific questions.

Contribution

The paper introduces GIEBench, a novel benchmark with 97 group identities and 999 questions, to assess LLMs' empathy towards various social identities, emphasizing the need for better alignment.

Findings

01

LLMs understand different identity standpoints

02

LLMs lack consistent empathy across identities

03

Explicit instructions improve empathetic responses

Abstract

As large language models (LLMs) continue to develop and gain widespread application, the ability of LLMs to exhibit empathy towards diverse group identities and understand their perspectives is increasingly recognized as critical. Most existing benchmarks for empathy evaluation of LLMs focus primarily on universal human emotions, such as sadness and pain, often overlooking the context of individuals' group identities. To address this gap, we introduce GIEBench, a comprehensive benchmark that includes 11 identity dimensions, covering 97 group identities with a total of 999 single-choice questions related to specific group identities. GIEBench is designed to evaluate the empathy of LLMs when presented with specific group identities such as gender, age, occupation, and race, emphasizing their ability to respond from the standpoint of the identified group. This supports the ongoing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

giebench/giebench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks

MethodsFocus