Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End   ASR Models

Yuki Takashima; Shota Horiguchi; Shinji Watanabe; Paola Garc\'ia,; Yohei Kawaguchi

arXiv:2207.00216·eess.AS·July 4, 2022·Interspeech

Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models

Yuki Takashima, Shota Horiguchi, Shinji Watanabe, Paola Garc\'ia,, Yohei Kawaguchi

PDF

Open Access

TL;DR

This paper introduces a method for incremental domain adaptation of end-to-end ASR models that prevents catastrophic forgetting by fine-tuning only specific encoder layers, reducing the need for extensive additional parameters.

Contribution

The paper identifies that adapting only the encoder's linear layers effectively prevents forgetting and proposes a targeted parameter selection method for efficient adaptation.

Findings

01

Adapting only encoder linear layers prevents catastrophic forgetting.

02

Element-wise parameter selection reduces fine-tuning parameters.

03

The approach outperforms full-model parameter selection in preserving accuracy.

Abstract

In this paper, we present an incremental domain adaptation technique to prevent catastrophic forgetting for an end-to-end automatic speech recognition (ASR) model. Conventional approaches require extra parameters of the same size as the model for optimization, and it is difficult to apply these approaches to end-to-end ASR models because they have a huge amount of parameters. To solve this problem, we first investigate which parts of end-to-end ASR models contribute to high accuracy in the target domain while preventing catastrophic forgetting. We conduct experiments on incremental domain adaptation from the LibriSpeech dataset to the AMI meeting corpus with two popular end-to-end ASR models and found that adapting only the linear layers of their encoders can prevent catastrophic forgetting. Then, on the basis of this finding, we develop an element-wise parameter selection focused on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing