Multilingual sequence-to-sequence speech recognition: architecture,   transfer learning, and language modeling

Jaejin Cho; Murali Karthick Baskar; Ruizhi Li; Matthew Wiesner; Sri; Harish Mallidi; Nelson Yalta; Martin Karafiat; Shinji Watanabe; Takaaki Hori

arXiv:1810.03459·cs.CL·October 9, 2018·5 cites

Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling

Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri, Harish Mallidi, Nelson Yalta, Martin Karafiat, Shinji Watanabe, Takaaki Hori

PDF

Open Access

TL;DR

This paper explores multilingual seq2seq speech recognition, demonstrating that transfer learning from a multilingual model and integrating RNNLMs significantly improve low-resource language recognition performance.

Contribution

It introduces a transfer learning approach from a multilingual seq2seq model to low-resource languages and evaluates different architectures and language model integration strategies.

Findings

01

Transfer learning yields substantial WER reductions across languages.

02

RNNLM integration significantly improves recognition accuracy.

03

Multilingual models perform comparably to models trained on twice the data.

Abstract

Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of requiring more data compared to conventional DNN-HMM systems. In this work, we attempt to use data from 10 BABEL languages to build a multi-lingual seq2seq model as a prior model, and then port them towards 4 other BABEL languages using transfer learning approach. We also explore different architectures for improving the prior multilingual seq2seq model. The paper also discusses the effect of integrating a recurrent neural network language model (RNNLM) with a seq2seq model during decoding. Experimental results show that the transfer learning approach from the multilingual model shows substantial gains over monolingual models across all 4 BABEL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence