Extremely Low Footprint End-to-End ASR System for Smart Device

Zhifu Gao; Yiwu Yao; Shiliang Zhang; Jun Yang; Ming Lei; Ian; McLoughlin

arXiv:2104.05784·cs.SD·July 8, 2021

Extremely Low Footprint End-to-End ASR System for Smart Device

Zhifu Gao, Yiwu Yao, Shiliang Zhang, Jun Yang, Ming Lei, Ian, McLoughlin

PDF

Open Access 1 Repo

TL;DR

This paper presents a highly efficient end-to-end speech recognition system optimized for smart devices, combining model compression and weight sharing to drastically reduce size with minimal accuracy loss.

Contribution

It introduces a novel low-footprint E2E ASR model using cross-layer weight sharing and compression techniques, enabling deployment on resource-constrained devices.

Findings

01

Achieves over 10x model size reduction on AISHELL-2

02

Maintains near-original accuracy with only 0.43% CER increase

03

Demonstrates effective deployment feasibility on smart devices

Abstract

Recently, end-to-end (E2E) speech recognition has become popular, since it can integrate the acoustic, pronunciation and language models into a single neural network, which outperforms conventional models. Among E2E approaches, attention-based models, e.g. Transformer, have emerged as being superior. Such models have opened the door to deployment of ASR on smart devices, however they still suffer from requiring a large number of model parameters. We propose an extremely low footprint E2E ASR system for smart devices, to achieve the goal of satisfying resource constraints without sacrificing recognition accuracy. We design cross-layer weight sharing to improve parameter efficiency and further exploit model compression methods including sparsification and quantization, to reduce memory storage and boost decoding efficiency. We evaluate our approaches on the public AISHELL-1 and AISHELL-2…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alibaba-damo-academy/FunASR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing