Detecting Unintended Memorization in Language-Model-Fused ASR

W. Ronny Huang; Steve Chien; Om Thakkar; Rajiv Mathews

arXiv:2204.09606·cs.CL·June 29, 2022·1 cites

Detecting Unintended Memorization in Language-Model-Fused ASR

W. Ronny Huang, Steve Chien, Om Thakkar, Rajiv Mathews

PDF

Open Access

TL;DR

This paper presents a framework for detecting unintended memorization of rare sequences in language models fused with speech recognition systems, demonstrating its effectiveness and privacy benefits on a large-scale model.

Contribution

It introduces a black-box detection method for memorization in LM-fused speech recognizers and shows how gradient clipping reduces such memorization.

Findings

01

Detection of canary memorization in a 300M example LM training set is feasible.

02

Gradient clipping significantly reduces memorization without harming recognition quality.

03

Memorization of rare sequences can be mitigated to protect privacy.

Abstract

End-to-end (E2E) models are often being accompanied by language models (LMs) via shallow fusion for boosting their overall quality as well as recognition of rare words. At the same time, several prior works show that LMs are susceptible to unintentionally memorizing rare or unique sequences in the training data. In this work, we design a framework for detecting memorization of random textual sequences (which we call canaries) in the LM training data when one has only black-box (query) access to LM-fused speech recognizer, as opposed to direct access to the LM. On a production-grade Conformer RNN-T E2E model fused with a Transformer LM, we show that detecting memorization of singly-occurring canaries from the LM training data of 300M examples is possible. Motivated to protect privacy, we also show that such memorization gets significantly reduced by per-example gradient-clipped LM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Label Smoothing · Adam · Multi-Head Attention · Residual Connection · Absolute Position Encodings · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections