On the Effectiveness of Pinyin-Character Dual-Decoding for End-to-End   Mandarin Chinese ASR

Zhao Yang; Dianwen Ng; Xiao Fu; Liping Han; Wei Xi; Rui Wang; Rui; Jiang; Jizhong Zhao

arXiv:2201.10792·cs.CL·March 31, 2022·1 cites

On the Effectiveness of Pinyin-Character Dual-Decoding for End-to-End Mandarin Chinese ASR

Zhao Yang, Dianwen Ng, Xiao Fu, Liping Han, Wei Xi, Rui Wang, Rui, Jiang, Jizhong Zhao

PDF

Open Access

TL;DR

This paper introduces a novel dual-decoding approach leveraging Pinyin and Character relationships in Mandarin Chinese ASR, demonstrating significant improvements over baseline models without using a language model.

Contribution

It proposes an asynchronous decoding method with fuzzy Pinyin sampling and a two-stage training strategy for end-to-end Mandarin Chinese ASR.

Findings

01

Significant accuracy improvements on AISHELL-1 dataset

02

Effective utilization of Pinyin-Character mutual promotion

03

Enhanced dual-decoder model outperforms strong baselines

Abstract

End-to-end automatic speech recognition (ASR) has achieved promising results. However, most existing end-to-end ASR methods neglect the use of specific language characteristics. For Mandarin Chinese ASR tasks, there exist mutual promotion relationship between Pinyin and Character where Chinese characters can be romanized by Pinyin. Based on the above intuition, we first investigate types of end-to-end encoder-decoder based models in the single-input dual-output (SIDO) multi-task framework, after which a novel asynchronous decoding with fuzzy Pinyin sampling method is proposed according to the one-to-one correspondence characteristics between Pinyin and Character. Furthermore, we proposed a two-stage training strategy to make training more stable and converge faster. The results on the test sets of AISHELL-1 dataset show that the proposed enhanced dual-decoder model without a language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing

MethodsAttention Is All You Need · Linear Layer · Softmax · Dense Connections · Position-Wise Feed-Forward Layer · Adam · Multi-Head Attention · Label Smoothing · Absolute Position Encodings · Residual Connection