Syllable-Based Sequence-to-Sequence Speech Recognition with the   Transformer in Mandarin Chinese

Shiyu Zhou; Linhao Dong; Shuang Xu; Bo Xu

arXiv:1804.10752·eess.AS·June 5, 2018·28 cites

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu

PDF

Open Access 1 Repo

TL;DR

This paper applies the Transformer model to Mandarin Chinese speech recognition, comparing syllable-based and phoneme-based approaches, and introduces a cascading decoder to improve word sequence mapping, achieving competitive CER results.

Contribution

It extends the Transformer architecture to Mandarin Chinese ASR, compares syllable and phoneme representations, and proposes a cascading decoder for improved sequence mapping.

Findings

01

Syllable-based Transformer model outperforms phoneme-based model.

02

Achieved CER of 28.77%, competitive with state-of-the-art.

03

Proposed cascading decoder effectively maps phoneme and syllable sequences to words.

Abstract

Sequence-to-sequence attention-based models have recently shown very promising results on automatic speech recognition (ASR) tasks, which integrate an acoustic, pronunciation and language model into a single neural network. In these models, the Transformer, a new sequence-to-sequence attention-based model relying entirely on self-attention without using RNNs or convolutions, achieves a new single-model state-of-the-art BLEU on neural machine translation (NMT) tasks. Since the outstanding performance of the Transformer, we extend it to speech and concentrate on it as the basic architecture of sequence-to-sequence attention-based model on Mandarin Chinese ASR tasks. Furthermore, we investigate a comparison between syllable based model and context-independent phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese. Additionally, a greedy cascading decoder with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gentaiscool/end2end-asr-pytorch
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax