Scaling Laws for Discriminative Speech Recognition Rescoring Models

Yile Gu; Prashanth Gurunath Shivakumar; Jari Kolehmainen; Ankur; Gandhe; Ariya Rastrow; Ivan Bulyko

arXiv:2306.15815·eess.AS·June 29, 2023·Interspeech

Scaling Laws for Discriminative Speech Recognition Rescoring Models

Yile Gu, Prashanth Gurunath Shivakumar, Jari Kolehmainen, Ankur, Gandhe, Ariya Rastrow, Ivan Bulyko

PDF

Open Access

TL;DR

This paper demonstrates that discriminative speech recognition rescoring models, specifically RescoreBERT, follow scaling laws with respect to data and model size, impacting word error rate and transfer learning efficiency.

Contribution

It extends the concept of scaling laws to second-pass speech recognition rescoring models, showing their WER follows power-law relationships and pre-training reduces data needs.

Findings

01

WER follows a power-law with data and model size

02

Pre-trained models require less data than randomly initialized ones

03

Effective data transfer from pre-training also follows a scaling law

Abstract

Recent studies have found that model performance has a smooth power-law relationship, or scaling laws, with training data and model size, for a wide range of problems. These scaling laws allow one to choose nearly optimal data and model sizes. We study whether this scaling property is also applicable to second-pass rescoring, which is an important component of speech recognition systems. We focus on RescoreBERT as the rescoring model, which uses a pre-trained Transformer-based architecture fined tuned with an ASR discriminative loss. Using such a rescoring model, we show that the word error rate (WER) follows a scaling law for over two orders of magnitude as training data and model size increase. In addition, it is found that a pre-trained model would require less data than a randomly initialized model of the same size, representing effective data transferred from pre-training step.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsFocus