Can Large Language Models Generate Effective Datasets for Emotion Recognition in Conversations?

Burak Can Kaplan; Hugo Cesar De Castro Carneiro; Stefan Wermter

arXiv:2508.05474·cs.AI·August 8, 2025

Can Large Language Models Generate Effective Datasets for Emotion Recognition in Conversations?

Burak Can Kaplan, Hugo Cesar De Castro Carneiro, Stefan Wermter

PDF

TL;DR

This paper explores using a small, resource-efficient Large Language Model to generate diverse emotion recognition datasets, improving model robustness and performance on existing benchmarks.

Contribution

It introduces a novel approach of synthesizing ERC datasets with a small LLM, addressing data scarcity and bias issues in emotion recognition in conversations.

Findings

01

Generated datasets improve ERC classifier robustness.

02

Models trained on synthetic data outperform baselines.

03

Synthetic data helps mitigate label imbalance effects.

Abstract

Emotion recognition in conversations (ERC) focuses on identifying emotion shifts within interactions, representing a significant step toward advancing machine intelligence. However, ERC data remains scarce, and existing datasets face numerous challenges due to their highly biased sources and the inherent subjectivity of soft labels. Even though Large Language Models (LLMs) have demonstrated their quality in many affective tasks, they are typically expensive to train, and their application to ERC tasks--particularly in data generation--remains limited. To address these challenges, we employ a small, resource-efficient, and general-purpose LLM to synthesize ERC datasets with diverse properties, supplementing the three most widely used ERC benchmarks. We generate six novel datasets, with two tailored to enhance each benchmark. We evaluate the utility of these datasets to (1) supplement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.