An Experimental Study on Private Aggregation of Teacher Ensemble   Learning for End-to-End Speech Recognition

Chao-Han Huck Yang; I-Fan Chen; Andreas Stolcke; Sabato Marco; Siniscalchi; Chin-Hui Lee

arXiv:2210.05614·cs.SD·October 17, 2022

An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition

Chao-Han Huck Yang, I-Fan Chen, Andreas Stolcke, Sabato Marco, Siniscalchi, Chin-Hui Lee

PDF

Open Access

TL;DR

This paper extends the Private Aggregation of Teacher Ensembles (PATE) method to end-to-end speech recognition, demonstrating improved privacy-preserving accuracy on speech datasets by reducing word error rates under strict differential privacy constraints.

Contribution

It introduces a novel application of PATE to dynamic speech data, showing its effectiveness in preventing data leakage and enhancing ASR accuracy under differential privacy.

Findings

01

PATE-based models outperform DP-SGD in speech recognition tasks.

02

Significant word error rate reductions (26.2%-27.5%) under strict privacy budgets.

03

Proposes a DP-preserving pretraining approach for public speech data.

Abstract

Differential privacy (DP) is one data protection avenue to safeguard user information used for training deep models by imposing noisy distortion on privacy data. Such a noise perturbation often results in a severe performance degradation in automatic speech recognition (ASR) in order to meet a privacy budget $ε$ . Private aggregation of teacher ensemble (PATE) utilizes ensemble probabilities to improve ASR accuracy when dealing with the noise effects controlled by small values of $ε$ . We extend PATE learning to work with dynamic patterns, namely speech utterances, and perform a first experimental demonstration that it prevents acoustic data leakage in ASR training. We evaluate three end-to-end deep models, including LAS, hybrid CTC/attention, and RNN transducer, on the open-source LibriSpeech and TIMIT corpora. PATE learning-enhanced ASR models outperform the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis