Improving Automatic Speech Recognition with Decoder-Centric   Regularisation in Encoder-Decoder Models

Alexander Polok; Santosh Kesiraju; Karel Bene\v{s}; Luk\'a\v{s}; Burget; Jan \v{C}ernock\'y

arXiv:2410.17437·eess.AS·October 24, 2024

Improving Automatic Speech Recognition with Decoder-Centric Regularisation in Encoder-Decoder Models

Alexander Polok, Santosh Kesiraju, Karel Bene\v{s}, Luk\'a\v{s}, Burget, Jan \v{C}ernock\'y

PDF

Open Access 1 Models

TL;DR

This paper introduces DeCRED, a decoder-centric regularisation method for encoder-decoder ASR models that enhances robustness and out-of-domain generalisation, leading to improved WERs with less data and smaller models.

Contribution

The paper proposes a novel regularisation approach, DeCRED, with auxiliary classifiers in the decoder, improving ASR performance and out-of-domain robustness over existing models.

Findings

01

DeCRED improves WER by 2.7-2.9 on AMI and Gigaspeech datasets.

02

DeCRED enhances out-of-domain generalisation.

03

Strong baseline models achieve competitive results with less data.

Abstract

This paper proposes a simple yet effective way of regularising the encoder-decoder-based automatic speech recognition (ASR) models that enhance the robustness of the model and improve the generalisation to out-of-domain scenarios. The proposed approach is dubbed as $De$ coder- $C$ entric $R$ egularisation in $E$ ncoder- $D$ ecoder (DeCRED) architecture for ASR, where auxiliary classifier(s) is introduced in layers of the decoder module. Leveraging these classifiers, we propose two decoding strategies that re-estimate the next token probabilities. Using the recent E-branchformer architecture, we build strong ASR systems that obtained competitive WERs as compared to Whisper-medium and outperformed OWSM v3; while relying only on a fraction of training data and model size. On top of such a strong baseline, we show that DeCRED can further improve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
BUT-FIT/DeCRED-base
model· 29 dl
29 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsE-Branchformer