Learned Transferable Architectures Can Surpass Hand-Designed   Architectures for Large Scale Speech Recognition

Liqiang He; Dan Su; Dong Yu

arXiv:2008.11589·eess.AS·May 11, 2021·1 cites

Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition

Liqiang He, Dan Su, Dong Yu

PDF

Open Access

TL;DR

This paper demonstrates that neural architecture search (NAS) can find transferable speech recognition architectures that outperform hand-designed models on large datasets, reducing computational costs and enabling industrial-scale applications.

Contribution

The study introduces a revised search space for NAS in speech recognition and shows that architectures found on small datasets transfer effectively to large-scale datasets, outperforming hand-designed models.

Findings

01

Transferred architectures outperform hand-designed models on large datasets.

02

Revised search space reduces computational overhead and memory usage.

03

NAS architectures achieve over 20% relative improvement on AISHELL-2.

Abstract

In this paper, we explore the neural architecture search (NAS) for automatic speech recognition (ASR) systems. With reference to the previous works in the computer vision field, the transferability of the searched architecture is the main focus of our work. The architecture search is conducted on the small proxy dataset, and then the evaluation network, constructed with the searched architecture, is evaluated on the large dataset. Especially, we propose a revised search space for speech recognition tasks which theoretically facilitates the search algorithm to explore the architectures with low complexity. Extensive experiments show that: (i) the architecture searched on the small proxy dataset can be transferred to the large dataset for the speech recognition tasks. (ii) the architecture learned in the revised search space can greatly reduce the computational overhead and GPU memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques