On the Effectiveness of Pinyin-Character Dual-Decoding for End-to-End Mandarin Chinese ASR
Zhao Yang, Dianwen Ng, Xiao Fu, Liping Han, Wei Xi, Rui Wang, Rui, Jiang, Jizhong Zhao

TL;DR
This paper introduces a novel dual-decoding approach leveraging Pinyin and Character relationships in Mandarin Chinese ASR, demonstrating significant improvements over baseline models without using a language model.
Contribution
It proposes an asynchronous decoding method with fuzzy Pinyin sampling and a two-stage training strategy for end-to-end Mandarin Chinese ASR.
Findings
Significant accuracy improvements on AISHELL-1 dataset
Effective utilization of Pinyin-Character mutual promotion
Enhanced dual-decoder model outperforms strong baselines
Abstract
End-to-end automatic speech recognition (ASR) has achieved promising results. However, most existing end-to-end ASR methods neglect the use of specific language characteristics. For Mandarin Chinese ASR tasks, there exist mutual promotion relationship between Pinyin and Character where Chinese characters can be romanized by Pinyin. Based on the above intuition, we first investigate types of end-to-end encoder-decoder based models in the single-input dual-output (SIDO) multi-task framework, after which a novel asynchronous decoding with fuzzy Pinyin sampling method is proposed according to the one-to-one correspondence characteristics between Pinyin and Character. Furthermore, we proposed a two-stage training strategy to make training more stable and converge faster. The results on the test sets of AISHELL-1 dataset show that the proposed enhanced dual-decoder model without a language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing
MethodsAttention Is All You Need · Linear Layer · Softmax · Dense Connections · Position-Wise Feed-Forward Layer · Adam · Multi-Head Attention · Label Smoothing · Absolute Position Encodings · Residual Connection
