Hierarchical Multi Task Learning With CTC
Ramon Sanabria, Florian Metze

TL;DR
This paper introduces a hierarchical multi-task learning approach using CTC for speech recognition, which improves intermediate representations and achieves state-of-the-art results on the Switchboard dataset without additional language models.
Contribution
The paper proposes a novel hierarchical multi-task training method with CTC at multiple levels, enhancing intermediate representations and outperforming existing acoustic-to-word models.
Findings
Achieves 14.0% WER on Switchboard Eval2000 without language models.
Hierarchical multi-task training improves performance over single-task models.
Model outperforms current state-of-the-art acoustic-to-word systems.
Abstract
In Automatic Speech Recognition it is still challenging to learn useful intermediate representations when using high-level (or abstract) target units such as words. For that reason, character or phoneme based systems tend to outperform word-based systems when just few hundreds of hours of training data are being used. In this paper, we first show how hierarchical multi-task training can encourage the formation of useful intermediate representations. We achieve this by performing Connectionist Temporal Classification at different levels of the network with targets of different granularity. Our model thus performs predictions in multiple scales for the same input. On the standard 300h Switchboard training setup, our hierarchical multi-task architecture exhibits improvements over single-task architectures with the same number of parameters. Our model obtains 14.0% Word Error Rate on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques
