# Evaluating Automatic Speech Recognition in an Incremental Setting

**Authors:** Ryan Whetten, Mir Tahsin Imtiaz, Casey Kennington

arXiv: 2302.12049 · 2023-02-24

## TL;DR

This paper systematically evaluates six speech recognizers for incremental recognition, comparing their accuracy, latency, and stability, and introduces new metrics to better understand their performance in real-time applications.

## Contribution

It introduces Revokes per Second as a new metric and compares two methods for streaming audio, providing insights into the performance of different speech recognizers.

## Key findings

- Local recognizers are faster and require fewer updates than cloud-based ones.
- Meta's Wav2Vec is the fastest recognizer.
- Mozilla's DeepSpeech is the most stable in predictions.

## Abstract

The increasing reliability of automatic speech recognition has proliferated its everyday use. However, for research purposes, it is often unclear which model one should choose for a task, particularly if there is a requirement for speed as well as accuracy. In this paper, we systematically evaluate six speech recognizers using metrics including word error rate, latency, and the number of updates to already recognized words on English test data, as well as propose and compare two methods for streaming audio into recognizers for incremental recognition. We further propose Revokes per Second as a new metric for evaluating incremental recognition and demonstrate that it provides insights into overall model performance. We find that, generally, local recognizers are faster and require fewer updates than cloud-based recognizers. Finally, we find Meta's Wav2Vec model to be the fastest, and find Mozilla's DeepSpeech model to be the most stable in its predictions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.12049/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/2302.12049/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/2302.12049/full.md

---
Source: https://tomesphere.com/paper/2302.12049