Confidence-based Ensembles of End-to-End Speech Recognition Models

Igor Gitman; Vitaly Lavrukhin; Aleksandr Laptev; Boris Ginsburg

arXiv:2306.15824·eess.AS·August 21, 2023

Confidence-based Ensembles of End-to-End Speech Recognition Models

Igor Gitman, Vitaly Lavrukhin, Aleksandr Laptev, Boris Ginsburg

PDF

Open Access

TL;DR

This paper introduces confidence-based ensemble methods for end-to-end speech recognition models, effectively combining multiple models to improve performance across domains without extensive target data.

Contribution

It proposes a confidence-based ensemble approach that selects the most confident model output, outperforming traditional language identification and enabling effective model combination.

Findings

01

Confidence-based ensembles outperform language ID-based selection.

02

Combining base and adapted models improves accuracy on multiple datasets.

03

Effective with limited target data and various architectures.

Abstract

The number of end-to-end speech recognition models grows every year. These models are often adapted to new domains or languages resulting in a proliferation of expert systems that achieve great results on target data, while generally showing inferior performance outside of their domain of expertise. We explore combination of such experts via confidence-based ensembles: ensembles of models where only the output of the most-confident model is used. We assume that models' target data is not available except for a small validation set. We demonstrate effectiveness of our approach with two applications. First, we show that a confidence-based ensemble of 5 monolingual models outperforms a system where model selection is performed via a dedicated language identification block. Second, we demonstrate that it is possible to combine base and adapted models to achieve strong results on both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques

MethodsBalanced Selection