DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition
Alexander Polok, Santosh Kesiraju, Karel Bene\v{s}, Bolaji Yusuf, Luk\'a\v{s} Burget, Jan \v{C}ernock\'y

TL;DR
DeCRED introduces a decoder-centric regularization technique for encoder-decoder speech recognition models, significantly reducing internal language model perplexity and improving word error rates across multiple datasets and domains.
Contribution
The paper proposes DeCRED, a novel regularization method that adds auxiliary classifiers to the decoder, enhancing robustness and generalization in speech recognition.
Findings
Reduces internal LM perplexity by 36.6%
Improves WER on multiple datasets, e.g., from 6.4% to 6.3% in-domain
Achieves competitive WERs with less data and fewer parameters
Abstract
This paper presents a simple yet effective regularization for the internal language model induced by the decoder in encoder-decoder ASR models, thereby improving robustness and generalization in both in- and out-of-domain settings. The proposed method, Decoder-Centric Regularization in Encoder-Decoder (DeCRED), adds auxiliary classifiers to the decoder, enabling next token prediction via intermediate logits. Empirically, DeCRED reduces the mean internal LM BPE perplexity by 36.6% relative to 11 test sets. Furthermore, this translates into actual WER improvements over the baseline in 5 of 7 in-domain and 3 of 4 out-of-domain test sets, reducing macro WER from 6.4% to 6.3% and 18.2% to 16.2%, respectively. On TEDLIUM3, DeCRED achieves 7.0% WER, surpassing the baseline and encoder-centric InterCTC regularization by 0.6% and 0.5%, respectively. Finally, we compare DeCRED with OWSM v3.1 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
