Multi-Task Learning with Auxiliary Speaker Identification for   Conversational Emotion Recognition

Jingye Li; Meishan Zhang; Donghong Ji; Yijiang Liu

arXiv:2003.01478·cs.CL·March 6, 2020·22 cites

Multi-Task Learning with Auxiliary Speaker Identification for Conversational Emotion Recognition

Jingye Li, Meishan Zhang, Donghong Ji, Yijiang Liu

PDF

Open Access

TL;DR

This paper introduces a multi-task learning approach that uses speaker identification as an auxiliary task to improve conversational emotion recognition by learning speaker-aware contextual representations, achieving state-of-the-art results.

Contribution

The novel integration of speaker identification as an auxiliary task enhances utterance representations for better emotion recognition in conversations.

Findings

01

Achieved new state-of-the-art results on two benchmark datasets.

02

Demonstrated effectiveness of speaker-aware representations in CER.

03

Validated the approach's superiority over existing methods.

Abstract

Conversational emotion recognition (CER) has attracted increasing interests in the natural language processing (NLP) community. Different from the vanilla emotion recognition, effective speaker-sensitive utterance representation is one major challenge for CER. In this paper, we exploit speaker identification (SI) as an auxiliary task to enhance the utterance representation in conversations. By this method, we can learn better speaker-aware contextual representations from the additional SI corpus. Experiments on two benchmark datasets demonstrate that the proposed architecture is highly effective for CER, obtaining new state-of-the-art results on two datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Emotion and Mood Recognition · Speech Recognition and Synthesis