Multi-modal Anchor Gated Transformer with Knowledge Distillation for Emotion Recognition in Conversation

Jie Li; Shifei Ding; Lili Guo; Xuan Li

arXiv:2506.18716·cs.LG·June 24, 2025

Multi-modal Anchor Gated Transformer with Knowledge Distillation for Emotion Recognition in Conversation

Jie Li, Shifei Ding, Lili Guo, Xuan Li

PDF

TL;DR

This paper introduces MAGTKD, a novel multi-modal transformer with knowledge distillation for emotion recognition in conversations, improving modality integration and achieving state-of-the-art results on benchmark datasets.

Contribution

It proposes a multi-modal anchor gated transformer with knowledge distillation and prompt learning to enhance modality-specific representations for ERC.

Findings

01

Achieves state-of-the-art performance on IEMOCAP and MELD datasets.

02

Demonstrates the effectiveness of knowledge distillation in strengthening weaker modalities.

03

Shows improved integration of utterance-level representations across modalities.

Abstract

Emotion Recognition in Conversation (ERC) aims to detect the emotions of individual utterances within a conversation. Generating efficient and modality-specific representations for each utterance remains a significant challenge. Previous studies have proposed various models to integrate features extracted using different modality-specific encoders. However, they neglect the varying contributions of modalities to this task and introduce high complexity by aligning modalities at the frame level. To address these challenges, we propose the Multi-modal Anchor Gated Transformer with Knowledge Distillation (MAGTKD) for the ERC task. Specifically, prompt learning is employed to enhance textual modality representations, while knowledge distillation is utilized to strengthen representations of weaker modalities. Furthermore, we introduce a multi-modal anchor gated transformer to effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.