ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations
Kexue Wang, Yinfeng Yu, Liejun Wang

TL;DR
This paper introduces ML-SAN, a multi-level adaptive network that improves emotion recognition in conversations by addressing individual expressive differences through a three-stage speaker adaptation process.
Contribution
The novel ML-SAN model employs a three-stage adaptive process to effectively handle speaker variability in multimodal emotion recognition tasks.
Findings
ML-SAN outperforms existing models on MELD and IEMOCAP datasets.
It achieves better recognition of tail sentiment categories.
It effectively manages speaker diversity in real-world scenarios.
Abstract
To establish empathy with machines, it is essential to fully understand human emotional changes. However, research in multimodal emotion recognition often overlooks one problem: individual expressive traits vary significantly, which means that different people may express emotions differently. In our daily lives, we can see this. When communicating with different people, some express "happiness" through their facial expressions and words, while others may hide their happiness or express it through their actions. Both are expressions of 'happiness,' but such differences in emotional expression are still too difficult for machines to distinguish. Current emotion recognition remains at a 'static' level, using a single recognition model to identify all emotional styles. This "simplification" often affects the recognition results, especially in multi-turn dialogues. To address this problem,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
