Extracting Targeted Training Data from ASR Models, and How to Mitigate It
Ehsan Amid, Om Thakkar, Arun Narayanan, Rajiv Mathews, Fran\c{c}oise, Beaufays

TL;DR
This paper introduces Noise Masking, a novel method to extract sensitive training data from trained ASR models, and proposes Word Dropout as a mitigation technique to reduce such data leakage.
Contribution
It is the first to demonstrate targeted data extraction from trained ASR models and proposes an effective mitigation strategy using Word Dropout.
Findings
Noise Masking achieves 11.8% accuracy in extracting names from training data.
Model outputs include some training names 55.2% of the time.
Word Dropout significantly reduces data leakage while maintaining utility.
Abstract
Recent work has designed methods to demonstrate that model updates in ASR training can leak potentially sensitive attributes of the utterances used in computing the updates. In this work, we design the first method to demonstrate information leakage about training data from trained ASR models. We design Noise Masking, a fill-in-the-blank style method for extracting targeted parts of training data from trained ASR models. We demonstrate the success of Noise Masking by using it in four settings for extracting names from the LibriSpeech dataset used for training a state-of-the-art Conformer model. In particular, we show that we are able to extract the correct names from masked training utterances with 11.8% accuracy, while the model outputs some name from the train set 55.2% of the time. Further, we show that even in a setting that uses synthetic audio and partial transcripts from the test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsDropout
