Latency-Controlled Neural Architecture Search for Streaming Speech Recognition
Liqiang He, Shulin Feng, Dan Su, Dong Yu

TL;DR
This paper introduces a latency-controlled neural architecture search method tailored for streaming speech recognition, enabling the design of low-latency neural networks with significant accuracy improvements on large-scale datasets.
Contribution
It proposes a novel latency-controlled NAS framework that modifies neural cells and operation space to produce low-latency architectures for streaming ASR.
Findings
Achieved 550ms and 190ms latency architectures using the proposed method.
Low latency architecture outperforms hybrid CLDNN baseline by over 19%.
Demonstrated effectiveness on large-scale 10k-hour dataset.
Abstract
Neural architecture search (NAS) has attracted much attention and has been explored for automatic speech recognition (ASR). In this work, we focus on streaming ASR scenarios and propose the latency-controlled NAS for acoustic modeling. First, based on the vanilla neural architecture, normal cells are altered to causal cells to control the total latency of the architecture. Second, a revised operation space with a smaller receptive field is proposed to generate the final architecture with low latency. Extensive experiments show that: 1) Based on the proposed neural architecture, the neural networks with a medium latency of 550ms (millisecond) and a low latency of 190ms can be learned in the vanilla and revised operation space respectively. 2) For the low latency setting, the evaluation network can achieve more than 19\% (average on the four test sets) relative improvements compared with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
