LLM supervised Pre-training for Multimodal Emotion Recognition in   Conversations

Soumya Dutta; Sriram Ganapathy

arXiv:2501.11468·eess.AS·January 22, 2025

LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations

Soumya Dutta, Sriram Ganapathy

PDF

Open Access

TL;DR

This paper introduces a multimodal emotion recognition approach that leverages unsupervised speech transcripts, LLM-guided pseudo-labeling, and hierarchical training to improve accuracy on conversational datasets.

Contribution

It proposes a novel hierarchical training method combining speech and text embeddings with LLM-guided pseudo-labeling for emotion recognition.

Findings

01

Achieves state-of-the-art results on IEMOCAP and MELD datasets.

02

Improves emotion recognition accuracy over existing benchmarks.

03

Effectively integrates speech and text modalities for conversational emotion analysis.

Abstract

Emotion recognition in conversations (ERC) is challenging due to the multimodal nature of the emotion expression. In this paper, we propose to pretrain a text-based recognition model from unsupervised speech transcripts with LLM guidance. These transcriptions are obtained from a raw speech dataset with a pre-trained ASR system. A text LLM model is queried to provide pseudo-labels for these transcripts, and these pseudo-labeled transcripts are subsequently used for learning an utterance level text-based emotion recognition model. We use the utterance level text embeddings for emotion recognition in conversations along with speech embeddings obtained from a recently proposed pre-trained model. A hierarchical way of training the speech-text model is proposed, keeping in mind the conversational nature of the dataset. We perform experiments on three established datasets, namely, IEMOCAP,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition