End-to-end Ensemble-based Feature Selection for Paralinguistics Tasks

Tam\'as Gr\'osz; Mittul Singh; Sudarsana Reddy Kadiri; Hemant; Kathania; Mikko Kurimo

arXiv:2210.15978·eess.AS·October 31, 2022·1 cites

End-to-end Ensemble-based Feature Selection for Paralinguistics Tasks

Tam\'as Gr\'osz, Mittul Singh, Sudarsana Reddy Kadiri, Hemant, Kathania, Mikko Kurimo

PDF

Open Access

TL;DR

This paper introduces an ensemble-based feature selection method that reduces inference time significantly in paralinguistic tasks like mask detection and breathing state prediction, facilitating real-time telemedicine applications.

Contribution

It proposes an output-gradient-based feature selection approach that enables the creation of faster, memory-efficient neural network ensembles without sacrificing accuracy.

Findings

01

25-32% reduction in inference times

02

Maintains competitive accuracy with smaller ensembles

03

Enables real-time telemedicine applications

Abstract

The events of recent years have highlighted the importance of telemedicine solutions which could potentially allow remote treatment and diagnosis. Relatedly, Computational Paralinguistics, a unique subfield of Speech Processing, aims to extract information about the speaker and form an important part of telemedicine applications. In this work, we focus on two paralinguistic problems: mask detection and breathing state prediction. Solutions developed for these tasks could be invaluable and have the potential to help monitor and limit the spread of a virus like COVID-19. The current state-of-the-art methods proposed for these tasks are ensembles based on deep neural networks like ResNets in conjunction with feature engineering. Although these ensembles can achieve high accuracy, they also have a large footprint and require substantial computational power reducing portability to devices…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · COVID-19 diagnosis using AI

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Feature Selection