Enhancing Quantised End-to-End ASR Models via Personalisation

Qiuming Zhao; Guangzhi Sun; Chao Zhang; Mingxing Xu and; Thomas Fang Zheng

arXiv:2309.09136·cs.SD·September 19, 2023

Enhancing Quantised End-to-End ASR Models via Personalisation

Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu and, Thomas Fang Zheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a personalisation strategy for quantised end-to-end ASR models that combines speaker adaptive training with model quantisation, significantly reducing model size while improving recognition accuracy.

Contribution

It proposes a novel PQM approach integrating NF4 quantisation and LoRA for speaker personalisation in quantised ASR models, enhancing performance on resource-constrained devices.

Findings

01

Achieved 15.1% and 23.3% relative WER reductions on quantised models.

02

Reduced model size by 7x with minimal speaker-specific parameters.

03

Demonstrated effectiveness on LibriSpeech and TED-LIUM 3 datasets.

Abstract

Recent end-to-end automatic speech recognition (ASR) models have become increasingly larger, making them particularly challenging to be deployed on resource-constrained devices. Model quantisation is an effective solution that sometimes causes the word error rate (WER) to increase. In this paper, a novel strategy of personalisation for a quantised model (PQM) is proposed, which combines speaker adaptive training (SAT) with model quantisation to improve the performance of heavily compressed models. Specifically, PQM uses a 4-bit NormalFloat Quantisation (NF4) approach for model quantisation and low-rank adaptation (LoRA) for SAT. Experiments have been performed on the LibriSpeech and the TED-LIUM 3 corpora. Remarkably, with a 7x reduction in model size and 1% additional speaker-specific parameters, 15.1% and 23.3% relative WER reductions were achieved on quantised Whisper and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qmgzhao/pqm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing