Continual learning using lattice-free MMI for speech recognition

Hossein Hadian; Arseniy Gorin

arXiv:2110.07055·eess.AS·October 15, 2021

Continual learning using lattice-free MMI for speech recognition

Hossein Hadian, Arseniy Gorin

PDF

Open Access

TL;DR

This paper explores regularization-based continual learning methods for neural network acoustic models in speech recognition, demonstrating improved domain adaptation and reduced forgetting using sequence-level LWF with LF-MMI training.

Contribution

It introduces a sequence-level LWF regularization technique that leverages LF-MMI posteriors to enhance continual learning in speech recognition models.

Findings

01

Sequence-level LWF improves average WER by up to 9.4% relative.

02

Regular LWF and EWC help mitigate catastrophic forgetting.

03

The approach effectively adapts models to multiple speech domains.

Abstract

Continual learning (CL), or domain expansion, recently became a popular topic for automatic speech recognition (ASR) acoustic modeling because practical systems have to be updated frequently in order to work robustly on types of speech not observed during initial training. While sequential adaptation allows tuning a system to a new domain, it may result in performance degradation on the old domains due to catastrophic forgetting. In this work we explore regularization-based CL for neural network acoustic models trained with the lattice-free maximum mutual information (LF-MMI) criterion. We simulate domain expansion by incrementally adapting the acoustic model on different public datasets that include several accents and speaking styles. We investigate two well-known CL techniques, elastic weight consolidation (EWC) and learning without forgetting (LWF), which aim to reduce forgetting by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing