Test-Time Adaptation for Speech Emotion Recognition
Jiaheng Dong, Hong Jia, Ting Dang

TL;DR
This paper systematically evaluates test-time adaptation methods for speech emotion recognition, revealing that backpropagation-free approaches are most effective and that methods relying on confident pseudo-labels often fail due to emotional ambiguity.
Contribution
First comprehensive evaluation of TTA methods for SER, highlighting the effectiveness of backpropagation-free techniques and the limitations of entropy minimization and pseudo-labeling.
Findings
Backpropagation-free TTA methods outperform others in SER.
Entropy minimization and pseudo-labeling often fail due to emotional ambiguity.
No single TTA method is universally effective across all tasks.
Abstract
The practical utility of Speech Emotion Recognition (SER) systems is undermined by their fragility to domain shifts, such as speaker variability, the distinction between acted and naturalistic emotions, and cross-corpus variations. While domain adaptation and fine-tuning are widely studied, they require either source data or labelled target data, which are often unavailable or raise privacy concerns in SER. Test-time adaptation (TTA) bridges this gap by adapting models at inference using only unlabeled target data. Yet, having been predominantly designed for image classification and speech recognition, the efficacy of TTA for mitigating the unique domain shifts in SER has not been investigated. In this paper, we present the first systematic evaluation and comparison covering 11 TTA methods across three representative SER tasks. The results indicate that backpropagation-free TTA methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Face and Expression Recognition
