Aalto's End-to-End DNN systems for the INTERSPEECH 2020 Computational   Paralinguistics Challenge

Tam\'as Gr\'osz; Mittul Singh; Sudarsana Reddy Kadiri; Hemant; Kathania; Mikko Kurimo

arXiv:2008.02689·eess.AS·August 7, 2020

Aalto's End-to-End DNN systems for the INTERSPEECH 2020 Computational Paralinguistics Challenge

Tam\'as Gr\'osz, Mittul Singh, Sudarsana Reddy Kadiri, Hemant, Kathania, Mikko Kurimo

PDF

Open Access

TL;DR

This paper explores ensemble and task-specific modifications of end-to-end neural networks to improve performance on diverse paralinguistic tasks in the INTERSPEECH 2020 challenge, demonstrating significant gains over single models and feature-engineered baselines.

Contribution

It introduces task-specific ensemble strategies and modifications for E2E models, tailored to three distinct sub-challenges in paralinguistic speech analysis.

Findings

01

Ensemble models outperform single E2E models on all sub-challenges.

02

Multi-loss strategies improve performance on the breathing sub-challenge.

03

E2E systems without feature engineering are competitive and enhance baseline results.

Abstract

End-to-end neural network models (E2E) have shown significant performance benefits on different INTERSPEECH ComParE tasks. Prior work has applied either a single instance of an E2E model for a task or the same E2E architecture for different tasks. However, applying a single model is unstable or using the same architecture under-utilizes task-specific information. On ComParE 2020 tasks, we investigate applying an ensemble of E2E models for robust performance and developing task-specific modifications for each task. ComParE 2020 introduces three sub-challenges: the breathing sub-challenge to predict the output of a respiratory belt worn by a patient while speaking, the elderly sub-challenge to estimate the elderly speaker's arousal and valence levels and the mask sub-challenge to classify if the speaker is wearing a mask or not. On each of these tasks, an ensemble outperforms the single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems