Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper
Iuliia Thorbecke, Juan Zuluaga-Gomez, Esa\'u Villatoro-Tello, Shashi, Kumar, Pradeep Rangappa, Sergio Burdisso, Petr Motlicek, Karthik Pandia,, Aravind Ganapathiraju

TL;DR
This paper introduces a method to train streaming Transformer-Transducer ASR models from scratch using pseudo-labeled speech from foundational models, eliminating the need for large datasets and extensive pre-training.
Contribution
It demonstrates that robust streaming ASR models can be trained from scratch with pseudo-labels on consumer GPUs, simplifying the training process and reducing resource requirements.
Findings
Streaming TT models can be trained from scratch with pseudo-labeled data.
Shallow fusion of n-gram LMs improves model performance.
Contextual biasing enhances recognition of named entities.
Abstract
The training of automatic speech recognition (ASR) with little to no supervised data remains an open question. In this work, we demonstrate that streaming Transformer-Transducer (TT) models can be trained from scratch in consumer and accessible GPUs in their entirety with pseudo-labeled (PL) speech from foundational speech models (FSM). This allows training a robust ASR model just in one stage and does not require large data and computational budget compared to the two-step scenario with pre-training and fine-tuning. We perform a comprehensive ablation on different aspects of PL-based streaming TT models such as the impact of (1) shallow fusion of n-gram LMs, (2) contextual biasing with named entities, (3) chunk-wise decoding for low-latency streaming applications, and (4) TT overall performance as the function of the FSM size. Our results demonstrate that TT can be trained from scratch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Machining and Optimization Techniques · Manufacturing Process and Optimization · Robot Manipulation and Learning
