On the Impact of Quantization and Pruning of Self-Supervised Speech   Models for Downstream Speech Recognition Tasks "In-the-Wild''

Arthur Pimentel; Heitor Guimar\~aes; Anderson R. Avila; Mehdi; Rezagholizadeh; Tiago H. Falk

arXiv:2309.14462·eess.AS·September 27, 2023

On the Impact of Quantization and Pruning of Self-Supervised Speech Models for Downstream Speech Recognition Tasks "In-the-Wild''

Arthur Pimentel, Heitor Guimar\~aes, Anderson R. Avila, Mehdi, Rezagholizadeh, Tiago H. Falk

PDF

Open Access

TL;DR

This paper investigates how quantization and pruning affect the performance of self-supervised speech models, especially under challenging real-world conditions like noise and reverberation, to enable efficient deployment on resource-limited devices.

Contribution

It provides an analysis of the impact of model compression techniques on self-supervised speech models in diverse and challenging acoustic environments.

Findings

01

Quantization and pruning degrade speech recognition accuracy under noisy conditions.

02

Model compression effects vary with different types of acoustic distortions.

03

Results inform the design of resource-efficient speech recognition systems for real-world use.

Abstract

Recent advances with self-supervised learning have allowed speech recognition systems to achieve state-of-the-art (SOTA) word error rates (WER) while requiring only a fraction of the labeled training data needed by its predecessors. Notwithstanding, while such models achieve SOTA performance in matched train/test conditions, their performance degrades substantially when tested in unseen conditions. To overcome this problem, strategies such as data augmentation and/or domain shift training have been explored. Available models, however, are still too large to be considered for edge speech applications on resource-constrained devices, thus model compression tools are needed. In this paper, we explore the effects that train/test mismatch conditions have on speech recognition accuracy based on compressed self-supervised speech models. In particular, we report on the effects that parameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsPruning