Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition
Xiaodong Cui, Michael Picheny

TL;DR
This paper introduces an enhanced evolutionary stochastic gradient descent method that uses anchor models to improve acoustic model optimization for automatic speech recognition, leading to better performance.
Contribution
It proposes a novel ESGD variant incorporating anchor models to ensure non-degradation of model quality during optimization.
Findings
Improved loss and ASR performance on BN50 and SWB300 datasets.
Anchor-based ESGD outperforms traditional ESGD and well-trained models.
Guarantees the best fitness of the population does not degrade from the anchor.
Abstract
Evolutionary stochastic gradient descent (ESGD) was proposed as a population-based approach that combines the merits of gradient-aware and gradient-free optimization algorithms for superior overall optimization performance. In this paper we investigate a variant of ESGD for optimization of acoustic models for automatic speech recognition (ASR). In this variant, we assume the existence of a well-trained acoustic model and use it as an anchor in the parent population whose good "gene" will propagate in the evolution to the offsprings. We propose an ESGD algorithm leveraging the anchor models such that it guarantees the best fitness of the population will never degrade from the anchor model. Experiments on 50-hour Broadcast News (BN50) and 300-hour Switchboard (SWB300) show that the ESGD with anchors can further improve the loss and ASR performance over the existing well-trained acoustic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
