PCQ: Emotion Recognition in Speech via Progressive Channel Querying
Xincheng Wang, Liejun Wang, Yinfeng Yu, Xinxin Jiao

TL;DR
The paper introduces PCQ, a novel method for speech emotion recognition that dynamically models long-term emotional context by progressive channel querying, leading to improved accuracy on benchmark datasets.
Contribution
The paper proposes a pioneering progressive channel querying approach for SER, effectively capturing long-term temporal correlations and emotional nuances in speech.
Findings
Improves WA accuracy by 3.98% on IEMOCAP.
Enhances UA accuracy by 5.83% on EMODB.
Significantly exceeds baseline performance levels.
Abstract
In human-computer interaction (HCI), Speech Emotion Recognition (SER) is a key technology for understanding human intentions and emotions. Traditional SER methods struggle to effectively capture the long-term temporal correla-tions and dynamic variations in complex emotional expressions. To overcome these limitations, we introduce the PCQ method, a pioneering approach for SER via \textbf{P}rogressive \textbf{C}hannel \textbf{Q}uerying. This method can drill down layer by layer in the channel dimension through the channel query technique to achieve dynamic modeling of long-term contextual information of emotions. This mul-ti-level analysis gives the PCQ method an edge in capturing the nuances of hu-man emotions. Experimental results show that our model improves the weighted average (WA) accuracy by 3.98\% and 3.45\% and the unweighted av-erage (UA) accuracy by 5.67\% and 5.83\% on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis
