Impact of Dataset on Acoustic Models for Automatic Speech Recognition

Siddhesh Singh

arXiv:2203.13590·cs.LG·March 28, 2022

Impact of Dataset on Acoustic Models for Automatic Speech Recognition

Siddhesh Singh

PDF

Open Access

TL;DR

This paper investigates how the size of training datasets influences the performance and computational costs of GMM-HMM acoustic models in automatic speech recognition, highlighting the importance of dataset scale.

Contribution

It provides an analysis of the effects of dataset size on acoustic model accuracy and resource requirements, an area previously lacking detailed study.

Findings

01

Larger datasets improve model accuracy

02

Smaller datasets increase risk of overfitting

03

Dataset size significantly affects computational costs

Abstract

In Automatic Speech Recognition, GMM-HMM had been widely used for acoustic modelling. With the current advancement of deep learning, the Gaussian Mixture Model (GMM) from acoustic models has been replaced with Deep Neural Network, namely DNN-HMM Acoustic Models. The GMM models are widely used to create the alignments of the training data for the hybrid deep neural network model, thus making it an important task to create accurate alignments. Many factors such as training dataset size, training data augmentation, model hyperparameters, etc., affect the model learning. Traditionally in machine learning, larger datasets tend to have better performance, while smaller datasets tend to trigger over-fitting. The collection of speech data and their accurate transcriptions is a significant challenge that varies over different languages, and in most cases, it might be limited to big…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing