End-to-end speech recognition modeling from de-identified data

Martin Flechl; Shou-Chun Yin; Junho Park; Peter Skala

arXiv:2207.05469·eess.AS·July 13, 2022

End-to-end speech recognition modeling from de-identified data

Martin Flechl, Shou-Chun Yin, Junho Park, Peter Skala

PDF

Open Access

TL;DR

This paper presents a two-step method to recover speech recognition performance lost due to data de-identification, by replacing PII with artificial audio and labels, achieving near-original accuracy while maintaining privacy.

Contribution

The authors introduce a novel approach combining artificial audio generation and data augmentation to mitigate performance loss from de-identification in speech recognition models.

Findings

01

Recovered up to 90% of performance degradation for PII recognition.

02

Maintained strong diarization performance despite data modifications.

03

Effective across different PII categories in medical speech data.

Abstract

De-identification of data used for automatic speech recognition modeling is a critical component in protecting privacy, especially in the medical domain. However, simply removing all personally identifiable information (PII) from end-to-end model training data leads to a significant performance degradation in particular for the recognition of names, dates, locations, and words from similar categories. We propose and evaluate a two-step method for partially recovering this loss. First, PII is identified, and each occurrence is replaced with a random word sequence of the same category. Then, corresponding audio is produced via text-to-speech or by splicing together matching audio fragments extracted from the corpus. These artificial audio/label pairs, together with speaker turns from the original data without PII, are used to train models. We evaluate the performance of this method on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling