End-to-end Ensemble-based Feature Selection for Paralinguistics Tasks
Tam\'as Gr\'osz, Mittul Singh, Sudarsana Reddy Kadiri, Hemant, Kathania, Mikko Kurimo

TL;DR
This paper introduces an ensemble-based feature selection method that reduces inference time significantly in paralinguistic tasks like mask detection and breathing state prediction, facilitating real-time telemedicine applications.
Contribution
It proposes an output-gradient-based feature selection approach that enables the creation of faster, memory-efficient neural network ensembles without sacrificing accuracy.
Findings
25-32% reduction in inference times
Maintains competitive accuracy with smaller ensembles
Enables real-time telemedicine applications
Abstract
The events of recent years have highlighted the importance of telemedicine solutions which could potentially allow remote treatment and diagnosis. Relatedly, Computational Paralinguistics, a unique subfield of Speech Processing, aims to extract information about the speaker and form an important part of telemedicine applications. In this work, we focus on two paralinguistic problems: mask detection and breathing state prediction. Solutions developed for these tasks could be invaluable and have the potential to help monitor and limit the spread of a virus like COVID-19. The current state-of-the-art methods proposed for these tasks are ensembles based on deep neural networks like ResNets in conjunction with feature engineering. Although these ensembles can achieve high accuracy, they also have a large footprint and require substantial computational power reducing portability to devices…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · COVID-19 diagnosis using AI
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Feature Selection
