Synth-Empathy: Towards High-Quality Synthetic Empathy Data

Hao Liang; Linzhuang Sun; Jingxuan Wei; Xijie Huang; Linkun Sun; Bihui; Yu; Conghui He; Wentao Zhang

arXiv:2407.21669·cs.CL·August 13, 2024

Synth-Empathy: Towards High-Quality Synthetic Empathy Data

Hao Liang, Linzhuang Sun, Jingxuan Wei, Xijie Huang, Linkun Sun, Bihui, Yu, Conghui He, Wentao Zhang

PDF

Open Access 1 Repo

TL;DR

Synth-Empathy introduces an LLM-based pipeline to automatically generate, select, and improve high-quality empathetic data, leading to state-of-the-art results in empathetic response tasks and offering insights into data quality trade-offs.

Contribution

It presents a novel LLM-driven method for generating and selecting high-quality empathetic data, enhancing empathetic response performance and reducing human labeling effort.

Findings

01

Achieved state-of-the-art empathetic response performance.

02

Demonstrated robustness across multiple benchmarks.

03

Provided insights into data quantity and quality trade-offs.

Abstract

In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capabilities has become a crucial prerequisite. Consequently, managing and understanding empathetic datasets have gained increasing significance. However, empathetic data are typically human-labeled, leading to insufficient datasets and wasted human labor. In this work, we present Synth-Empathy, an LLM-based data generation and quality and diversity selection pipeline that automatically generates high-quality empathetic data while discarding low-quality data. With the data generated from a low empathetic model, we are able to further improve empathetic response performance and achieve state-of-the-art (SoTA) results across multiple benchmarks. Moreover, our model achieves SoTA performance on various human evaluation benchmarks, demonstrating its effectiveness and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aurora-slz/synth-empathy
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health Research Topics