Latency-Controlled Neural Architecture Search for Streaming Speech   Recognition

Liqiang He; Shulin Feng; Dan Su; Dong Yu

arXiv:2105.03643·eess.AS·September 15, 2021

Latency-Controlled Neural Architecture Search for Streaming Speech Recognition

Liqiang He, Shulin Feng, Dan Su, Dong Yu

PDF

Open Access

TL;DR

This paper introduces a latency-controlled neural architecture search method tailored for streaming speech recognition, enabling the design of low-latency neural networks with significant accuracy improvements on large-scale datasets.

Contribution

It proposes a novel latency-controlled NAS framework that modifies neural cells and operation space to produce low-latency architectures for streaming ASR.

Findings

01

Achieved 550ms and 190ms latency architectures using the proposed method.

02

Low latency architecture outperforms hybrid CLDNN baseline by over 19%.

03

Demonstrated effectiveness on large-scale 10k-hour dataset.

Abstract

Neural architecture search (NAS) has attracted much attention and has been explored for automatic speech recognition (ASR). In this work, we focus on streaming ASR scenarios and propose the latency-controlled NAS for acoustic modeling. First, based on the vanilla neural architecture, normal cells are altered to causal cells to control the total latency of the architecture. Second, a revised operation space with a smaller receptive field is proposed to generate the final architecture with low latency. Extensive experiments show that: 1) Based on the proposed neural architecture, the neural networks with a medium latency of 550ms (millisecond) and a low latency of 190ms can be learned in the vanilla and revised operation space respectively. 2) For the low latency setting, the evaluation network can achieve more than 19\% (average on the four test sets) relative improvements compared with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing