Aalto's End-to-End DNN systems for the INTERSPEECH 2020 Computational Paralinguistics Challenge
Tam\'as Gr\'osz, Mittul Singh, Sudarsana Reddy Kadiri, Hemant, Kathania, Mikko Kurimo

TL;DR
This paper explores ensemble and task-specific modifications of end-to-end neural networks to improve performance on diverse paralinguistic tasks in the INTERSPEECH 2020 challenge, demonstrating significant gains over single models and feature-engineered baselines.
Contribution
It introduces task-specific ensemble strategies and modifications for E2E models, tailored to three distinct sub-challenges in paralinguistic speech analysis.
Findings
Ensemble models outperform single E2E models on all sub-challenges.
Multi-loss strategies improve performance on the breathing sub-challenge.
E2E systems without feature engineering are competitive and enhance baseline results.
Abstract
End-to-end neural network models (E2E) have shown significant performance benefits on different INTERSPEECH ComParE tasks. Prior work has applied either a single instance of an E2E model for a task or the same E2E architecture for different tasks. However, applying a single model is unstable or using the same architecture under-utilizes task-specific information. On ComParE 2020 tasks, we investigate applying an ensemble of E2E models for robust performance and developing task-specific modifications for each task. ComParE 2020 introduces three sub-challenges: the breathing sub-challenge to predict the output of a respiratory belt worn by a patient while speaking, the elderly sub-challenge to estimate the elderly speaker's arousal and valence levels and the mask sub-challenge to classify if the speaker is wearing a mask or not. On each of these tasks, an ensemble outperforms the single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
