TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition   in Conversation

Taeyang Yun; Hyunkuk Lim; Jeonghwan Lee; Min Song

arXiv:2401.12987·cs.CL·April 2, 2024·1 cites

TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation

Taeyang Yun, Hyunkuk Lim, Jeonghwan Lee, Min Song

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces TelME, a novel multimodal fusion network for emotion recognition in conversations that leverages teacher-student knowledge distillation and shifting fusion to improve the use of audio, visual, and text modalities.

Contribution

The paper proposes a teacher-leading multimodal fusion framework with knowledge distillation and shifting fusion, achieving state-of-the-art results in ERC tasks.

Findings

01

TelME outperforms existing models on MELD dataset.

02

Knowledge distillation enhances non-verbal modality contributions.

03

Shifting fusion effectively combines multimodal features.

Abstract

Emotion Recognition in Conversation (ERC) plays a crucial role in enabling dialogue systems to effectively respond to user requests. The emotions in a conversation can be identified by the representations from various modalities, such as audio, visual, and text. However, due to the weak contribution of non-verbal modalities to recognize emotions, multimodal ERC has always been considered a challenging task. In this paper, we propose Teacher-leading Multimodal fusion network for ERC (TelME). TelME incorporates cross-modal knowledge distillation to transfer information from a language model acting as the teacher to the non-verbal students, thereby optimizing the efficacy of the weak modalities. We then combine multimodal features using a shifting fusion approach in which student networks support the teacher. TelME achieves state-of-the-art performance in MELD, a multi-speaker conversation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuntaeyang/telme
pytorchOfficial

Videos

TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation· underline

Taxonomy

TopicsSpeech and dialogue systems · Emotion and Mood Recognition · Speech Recognition and Synthesis

MethodsKnowledge Distillation