Telephonetic: Making Neural Language Models Robust to ASR and Semantic Noise
Chris Larson, Tarek Lahlou, Diana Mingels, Zachary Kulis, Erik Mueller

TL;DR
Telephonetic introduces a data augmentation framework that enhances neural language models' robustness to ASR errors by combining phonetic and semantic perturbations, improving performance on speech-related tasks.
Contribution
The paper presents Telephonetic, a novel data augmentation method that leverages phonetic and semantic perturbations to adapt language models for noisy speech inputs.
Findings
Achieved state-of-the-art perplexity of 37.49 on PTB with models trained only on PTB.
Demonstrated effectiveness of Telephonetic as a bootstrapping technique for speech domain adaptation.
Enhanced language model robustness to ASR errors through combined phonetic and semantic augmentation.
Abstract
Speech processing systems rely on robust feature extraction to handle phonetic and semantic variations found in natural language. While techniques exist for desensitizing features to common noise patterns produced by Speech-to-Text (STT) and Text-to-Speech (TTS) systems, the question remains how to best leverage state-of-the-art language models (which capture rich semantic features, but are trained on only written text) on inputs with ASR errors. In this paper, we present Telephonetic, a data augmentation framework that helps robustify language model features to ASR corrupted inputs. To capture phonetic alterations, we employ a character-level language model trained using probabilistic masking. Phonetic augmentations are generated in two stages: a TTS encoder (Tacotron 2, WaveGlow) and a STT decoder (DeepSpeech). Similarly, semantic perturbations are produced by sampling from nearby…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax
