TL;DR
This paper advances elderly emotion recognition by combining acoustic and linguistic analysis, introducing strategies to improve model generalization and performance with limited labeled data.
Contribution
It presents a bi-modal framework for elderly emotion recognition using acoustic and linguistic features, and proposes methods to enhance model generalization and performance.
Findings
Task-specific dictionaries improve linguistic model performance with limited data
Fusion strategies enhance generalization across datasets
Bi-modal approach outperforms uni-modal models
Abstract
Acoustic and linguistic analysis for elderly emotion recognition is an under-studied and challenging research direction, but essential for the creation of digital assistants for the elderly, as well as unobtrusive telemonitoring of elderly in their residences for mental healthcare purposes. This paper presents our contribution to the INTERSPEECH 2020 Computational Paralinguistics Challenge (ComParE) - Elderly Emotion Sub-Challenge, which is comprised of two ternary classification tasks for arousal and valence recognition. We propose a bi-modal framework, where these tasks are modeled using state-of-the-art acoustic and linguistic features, respectively. In this study, we demonstrate that exploiting task-specific dictionaries and resources can boost the performance of linguistic models, when the amount of labeled data is small. Observing a high mismatch between development and test set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
