SYNTHEMPATHY: A Scalable Empathy Corpus Generated Using LLMs Without Any   Crowdsourcing

Run Chen; Jun Shin; Julia Hirschberg

arXiv:2502.17857·cs.CL·February 26, 2025

SYNTHEMPATHY: A Scalable Empathy Corpus Generated Using LLMs Without Any Crowdsourcing

Run Chen, Jun Shin, Julia Hirschberg

PDF

Open Access

TL;DR

This paper introduces SYNTHEMPATHY, a large-scale empathetic dialogue corpus generated entirely by LLMs, enabling scalable development of empathetic language models without crowdsourcing.

Contribution

It presents a novel framework for creating a large empathetic dialogue dataset using LLMs, bypassing the need for costly crowdsourcing.

Findings

01

Fine-tuning Mistral 7B on SYNTHEMPATHY improves empathy scores.

02

The corpus contains 105,000 empathetic responses to real-life situations.

03

The approach demonstrates scalable data generation for empathetic dialogue modeling.

Abstract

Previous research has shown that humans are more receptive towards language models that that exhibit empathetic behavior. While empathy is essential for developing helpful dialogue agents, very few large corpora containing empathetic dialogues are available for fine-tune LLMs. The few existing corpora have largely relied on crowdsourcing to simulate empathetic conversations, a process that is expensive, time-consuming, and not scalable to larger datasets. We propose a data generation framework for developing SYNTHEMPATHY, a large corpus containing 105k empathetic responses to real-life situations compiled through LLM generation. A base Mistral 7B model fine-tuned on our SYNTHEMPATHY corpus exhibits an increase in the average empathy score.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Wikis in Education and Collaboration · Semantic Web and Ontologies

MethodsBalanced Selection