Homophone-based Label Smoothing in End-to-End Automatic Speech   Recognition

Yi Zheng; Xianjie Yang; Xuyong Dang

arXiv:2004.03437·eess.AS·May 15, 2020·1 cites

Homophone-based Label Smoothing in End-to-End Automatic Speech Recognition

Yi Zheng, Xianjie Yang, Xuyong Dang

PDF

Open Access

TL;DR

This paper introduces a novel homophone-based label smoothing technique for end-to-end ASR that leverages pronunciation knowledge to improve recognition accuracy, demonstrating a 0.4% CER reduction in experiments.

Contribution

It proposes a new label smoothing method utilizing homophone knowledge, enhancing end-to-end speech recognition models' performance.

Findings

01

Reduces character error rate by 0.4% absolute

02

Uses pronunciation knowledge of homophones in label smoothing

03

Applicable to models learning acoustic and language models jointly

Abstract

A new label smoothing method that makes use of prior knowledge of a language at human level, homophone, is proposed in this paper for automatic speech recognition (ASR). Compared with its forerunners, the proposed method uses pronunciation knowledge of homophones in a more complex way. End-to-end ASR models that learn acoustic model and language model jointly and modelling units of characters are necessary conditions for this method. Experiments with hybrid CTC sequence-to-sequence model show that the new method can reduce character error rate (CER) by 0.4% absolutely.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsLabel Smoothing