Using multi-task learning to improve the performance of acoustic-to-word   and conventional hybrid models

Thai-Son Nguyen; Sebastian Stueker; Alex Waibel

arXiv:1902.01951·eess.AS·May 17, 2019·1 cites

Using multi-task learning to improve the performance of acoustic-to-word and conventional hybrid models

Thai-Son Nguyen, Sebastian Stueker, Alex Waibel

PDF

Open Access

TL;DR

This paper introduces a multi-task learning approach that jointly trains acoustic-to-word and hybrid speech recognition models, improving stability and performance without pre-training, and enhancing hybrid models with sequence-level optimization.

Contribution

The paper proposes a novel multi-task training framework that stabilizes acoustic-to-word model training and boosts hybrid model performance, eliminating the need for pre-training initialization.

Findings

01

Multi-task training improves A2W model stability and accuracy.

02

Joint training enhances hybrid model performance with sequence-level optimization.

03

Significant performance gains over baseline models are demonstrated.

Abstract

Acoustic-to-word (A2W) models that allow direct mapping from acoustic signals to word sequences are an appealing approach to end-to-end automatic speech recognition due to their simplicity. However, prior works have shown that modelling A2W typically encounters issues of data sparsity that prevent training such a model directly. So far, pre-training initialization is the only approach proposed to deal with this issue. In this work, we propose to build a shared neural network and optimize A2W and conventional hybrid models in a multi-task manner. Our results show that training an A2W model is much more stable with our multi-task model without pre-training initialization, and results in a significant improvement compared to a baseline model. Experiments also reveal that the performance of a hybrid acoustic model can be further improved when jointly training with a sequence-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing