Homophone-based Label Smoothing in End-to-End Automatic Speech Recognition
Yi Zheng, Xianjie Yang, Xuyong Dang

TL;DR
This paper introduces a novel homophone-based label smoothing technique for end-to-end ASR that leverages pronunciation knowledge to improve recognition accuracy, demonstrating a 0.4% CER reduction in experiments.
Contribution
It proposes a new label smoothing method utilizing homophone knowledge, enhancing end-to-end speech recognition models' performance.
Findings
Reduces character error rate by 0.4% absolute
Uses pronunciation knowledge of homophones in label smoothing
Applicable to models learning acoustic and language models jointly
Abstract
A new label smoothing method that makes use of prior knowledge of a language at human level, homophone, is proposed in this paper for automatic speech recognition (ASR). Compared with its forerunners, the proposed method uses pronunciation knowledge of homophones in a more complex way. End-to-end ASR models that learn acoustic model and language model jointly and modelling units of characters are necessary conditions for this method. Experiments with hybrid CTC sequence-to-sequence model show that the new method can reduce character error rate (CER) by 0.4% absolutely.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsLabel Smoothing
