Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks

Shoukang Hu; Xurong Xie; Mingyu Cui; Jiajun Deng; Shansong Liu,; Jianwei Yu; Mengzhe Geng; Xunying Liu; Helen Meng

arXiv:2201.03943·eess.AS·March 30, 2022·1 cites

Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks

Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu,, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng

PDF

Open Access 1 Repo

TL;DR

This paper employs neural architecture search techniques to automatically optimize TDNN-F neural networks for speech recognition, achieving significant word error rate reductions and model size savings over traditional systems.

Contribution

It introduces NAS methods tailored for TDNN-Fs, integrating architecture learning with LF-MMI training, and demonstrates substantial performance improvements and resource efficiency.

Findings

01

Up to 1.2% absolute WER reduction

02

31% reduction in model size

03

State-of-the-art WERs on benchmark datasets

Abstract

State-of-the-art automatic speech recognition (ASR) system development is data and computation intensive. The optimal design of deep neural networks (DNNs) for these systems often require expert knowledge and empirical evaluation. In this paper, a range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper-parameters of factored time delay neural networks (TDNN-Fs): i) the left and right splicing context offsets; and ii) the dimensionality of the bottleneck linear projection at each hidden layer. These techniques include the differentiable neural architecture search (DARTS) method integrating architecture learning with lattice-free MMI training; Gumbel-Softmax and pipelined DARTS methods reducing the confusion over candidate architectures and improving the generalization of architecture selection; and Penalized DARTS incorporating resource…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

skhu101/tdnn-f_nas
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Neural Networks and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Differentiable Architecture Search