Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation

Tomer Wullach; Shlomo E. Chazan

arXiv:2212.13378·cs.CL·December 29, 2022

Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation

Tomer Wullach, Shlomo E. Chazan

PDF

Open Access 1 Video

TL;DR

This paper introduces a decoding method for ASR that relaxes model confidence and aggregates information from multiple layers, improving recognition performance especially in low-resource settings without extra training or parameters.

Contribution

It proposes a confidence relaxation and layer aggregation technique for ASR decoding that enhances performance without additional training or model complexity.

Findings

01

Improves ASR decoding accuracy across various resource levels.

02

Reduces inference computation compared to existing methods.

03

Shows consistent gains especially in low-resource scenarios.

Abstract

Automatic Speech Recognition (ASR) systems frequently use a search-based decoding strategy aiming to find the best attainable transcript by considering multiple candidates. One prominent speech recognition decoding heuristic is beam search, which seeks the transcript with the greatest likelihood computed using the predicted distribution. While showing substantial performance gains in various tasks, beam search loses some of its effectiveness when the predicted probabilities are highly confident, i.e., the predicted distribution is massed for a single or very few classes. We show that recently proposed Self-Supervised Learning (SSL)-based ASR models tend to yield exceptionally confident predictions that may hamper beam search from truly considering a diverse set of candidates. We perform a layer analysis to reveal and visualize how predictions evolve, and propose a decoding procedure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing