DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition

Alexander Polok; Santosh Kesiraju; Karel Bene\v{s}; Bolaji Yusuf; Luk\'a\v{s} Burget; Jan \v{C}ernock\'y

arXiv:2508.08938·eess.AS·August 13, 2025

DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition

Alexander Polok, Santosh Kesiraju, Karel Bene\v{s}, Bolaji Yusuf, Luk\'a\v{s} Burget, Jan \v{C}ernock\'y

PDF

Open Access

TL;DR

DeCRED introduces a decoder-centric regularization technique for encoder-decoder speech recognition models, significantly reducing internal language model perplexity and improving word error rates across multiple datasets and domains.

Contribution

The paper proposes DeCRED, a novel regularization method that adds auxiliary classifiers to the decoder, enhancing robustness and generalization in speech recognition.

Findings

01

Reduces internal LM perplexity by 36.6%

02

Improves WER on multiple datasets, e.g., from 6.4% to 6.3% in-domain

03

Achieves competitive WERs with less data and fewer parameters

Abstract

This paper presents a simple yet effective regularization for the internal language model induced by the decoder in encoder-decoder ASR models, thereby improving robustness and generalization in both in- and out-of-domain settings. The proposed method, Decoder-Centric Regularization in Encoder-Decoder (DeCRED), adds auxiliary classifiers to the decoder, enabling next token prediction via intermediate logits. Empirically, DeCRED reduces the mean internal LM BPE perplexity by 36.6% relative to 11 test sets. Furthermore, this translates into actual WER improvements over the baseline in 5 of 7 in-domain and 3 of 4 out-of-domain test sets, reducing macro WER from 6.4% to 6.3% and 18.2% to 16.2%, respectively. On TEDLIUM3, DeCRED achieves 7.0% WER, surpassing the baseline and encoder-centric InterCTC regularization by 0.6% and 0.5%, respectively. Finally, we compare DeCRED with OWSM v3.1 and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders