DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for Embedded Speech and Audio Processing from Decentralised Data
Shahin Amiriparian (1), Tobias H\"ubner (1), Maurice Gerczuk (1),, Sandra Ottl (1), Bj\"orn W. Schuller (1,2) ((1) EIHW -- Chair of Embedded, Intelligence for Health Care, Wellbeing, University of Augsburg, Germany,, (2) GLAM -- Group on Language, Audio, and Music

TL;DR
DeepSpectrumLite is a lightweight transfer learning framework that enables real-time, on-device speech and audio recognition on embedded devices by fine-tuning pre-trained CNNs on spectrograms, achieving state-of-the-art results with low latency.
Contribution
It introduces a novel, resource-efficient transfer learning pipeline for embedded speech and audio processing using pre-trained CNNs and on-the-fly spectrogram augmentation.
Findings
Achieves real-time inference with 242 ms lag on a smartphone.
Operates decentralised, eliminating data upload needs.
Obtains state-of-the-art results on paralinguistics tasks.
Abstract
Deep neural speech and audio processing systems have a large number of trainable parameters, a relatively complex architecture, and require a vast amount of training data and computational power. These constraints make it more challenging to integrate such systems into embedded devices and utilise them for real-time, real-world applications. We tackle these limitations by introducing DeepSpectrumLite, an open-source, lightweight transfer learning framework for on-device speech and audio recognition using pre-trained image convolutional neural networks (CNNs). The framework creates and augments Mel-spectrogram plots on-the-fly from raw audio signals which are then used to finetune specific pre-trained CNNs for the target classification task. Subsequently, the whole pipeline can be run in real-time with a mean inference lag of 242.0 ms when a DenseNet121 model is used on a consumer-grade…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
