
TL;DR
This paper demonstrates that acoustic models trained with auto-predictive coding follow similar scaling laws as autoregressive models, enabling predictions of performance limits based on model size, training data, and irreducible loss.
Contribution
It extends existing scaling law frameworks to acoustic models with auto-predictive coding, providing a unified predictive model for their performance.
Findings
Scaling laws accurately predict acoustic model performance over two orders of magnitude.
The model can forecast the irreducible loss and performance limits of acoustic models.
Scaling laws help optimize hyper-parameters under data and compute constraints.
Abstract
There is a recent trend in machine learning to increase model quality by growing models to sizes previously thought to be unreasonable. Recent work has shown that autoregressive generative models with cross-entropy objective functions exhibit smooth power-law relationships, or scaling laws, that predict model quality from model size, training set size, and the available compute budget. These scaling laws allow one to choose nearly optimal hyper-parameters given constraints on available training data, model parameter count, or training computation budget. In this paper, we demonstrate that acoustic models trained with an auto-predictive coding loss behave as if they are subject to similar scaling laws. We extend previous work to jointly predict loss due to model size, to training set size, and to the inherent "irreducible loss" of the task. We find that the scaling laws accurately match…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
