Investigating the Impact of Word Informativeness on Speech Emotion Recognition
Sofoklis Kakouros

TL;DR
This paper explores how selecting speech segments based on word informativeness from language models improves emotion recognition accuracy by focusing on semantically important parts of speech.
Contribution
It introduces a novel method that uses pre-trained language models to identify informative segments for acoustic feature extraction in speech emotion recognition.
Findings
Improved emotion recognition accuracy using informative segments
Segment selection based on word informativeness enhances acoustic feature relevance
Method outperforms traditional long-form statistical approaches
Abstract
In emotion recognition from speech, a key challenge lies in identifying speech signal segments that carry the most relevant acoustic variations for discerning specific emotions. Traditional approaches compute functionals for features such as energy and F0 over entire sentences or longer speech portions, potentially missing essential fine-grained variation in the long-form statistics. This research investigates the use of word informativeness, derived from a pre-trained language model, to identify semantically important segments. Acoustic features are then computed exclusively for these identified segments, enhancing emotion recognition accuracy. The methodology utilizes standard acoustic prosodic features, their functionals, and self-supervised representations. Results indicate a notable improvement in recognition performance when features are computed on segments selected based on word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
