WavFT: Acoustic model finetuning with labelled and unlabelled data

Utkarsh Chauhan; Vikas Joshi; Rupesh R. Mehta

arXiv:2204.00348·cs.CL·April 4, 2022

WavFT: Acoustic model finetuning with labelled and unlabelled data

Utkarsh Chauhan, Vikas Joshi, Rupesh R. Mehta

PDF

Open Access

TL;DR

This paper introduces a novel acoustic model finetuning method that leverages both labelled and unlabelled data during the finetuning stage, reducing the need for large-scale pretraining and improving speech recognition accuracy.

Contribution

The paper proposes a joint training approach combining classification and contrastive losses for acoustic model finetuning with labelled and unlabelled data, outperforming traditional methods.

Findings

01

Achieved 11.2% WERR reduction on Gujarati

02

Achieved 9.19% WERR reduction on Bengali

03

Effective use of unlabelled data during finetuning

Abstract

Unsupervised and self-supervised learning methods have leveraged unlabelled data to improve the pretrained models. However, these methods need significantly large amount of unlabelled data and the computational cost of training models with such large amount of data can be prohibitively high. We address this issue by using unlabelled data during finetuning, instead of pretraining. We propose acoustic model finetuning (FT) using labelled and unlabelled data. The model is jointly trained to learn representations to classify senones, as well as learn contextual acoustic representations. Our training objective is a combination of cross entropy loss, suitable for classification task, and contrastive loss, suitable to learn acoustic representations. The proposed approach outperforms conventional finetuning with 11.2% and 9.19% word error rate relative (WERR) reduction on Gujarati and Bengali…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research