Hystoc: Obtaining word confidences for fusion of end-to-end ASR systems

Karel Bene\v{s}; Martin Kocour; Luk\'a\v{s} Burget

arXiv:2305.12579·cs.CL·May 23, 2023·1 cites

Hystoc: Obtaining word confidences for fusion of end-to-end ASR systems

Karel Bene\v{s}, Martin Kocour, Luk\'a\v{s} Burget

PDF

Open Access

TL;DR

Hystoc is a novel method that derives well-calibrated word confidences from end-to-end speech recognition hypotheses, improving fusion performance and accuracy estimation.

Contribution

Hystoc introduces an iterative alignment approach to extract word confidences from hypothesis scores, enhancing system fusion and confidence calibration.

Findings

01

Hystoc produces confidences correlating with hypothesis accuracy.

02

Fusion with Hystoc yields up to 1% WER improvement on Spanish RTVE2020.

03

Limited gains when fusing very similar systems using Hystoc.

Abstract

End-to-end (e2e) systems have recently gained wide popularity in automatic speech recognition. However, these systems do generally not provide well-calibrated word-level confidences. In this paper, we propose Hystoc, a simple method for obtaining word-level confidences from hypothesis-level scores. Hystoc is an iterative alignment procedure which turns hypotheses from an n-best output of the ASR system into a confusion network. Eventually, word-level confidences are obtained as posterior probabilities in the individual bins of the confusion network. We show that Hystoc provides confidences that correlate well with the accuracy of the ASR hypothesis. Furthermore, we show that utilizing Hystoc in fusion of multiple e2e ASR systems increases the gains from the fusion by up to 1\,\% WER absolute on Spanish RTVE2020 dataset. Finally, we experiment with using Hystoc for direct fusion of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing