Whispy: Adapting STT Whisper Models to Real-Time Environments

Antonio Bevilacqua; Paolo Saviano; Alessandro Amirante; Simon Pietro; Romano

arXiv:2405.03484·cs.SD·May 7, 2024·1 cites

Whispy: Adapting STT Whisper Models to Real-Time Environments

Antonio Bevilacqua, Paolo Saviano, Alessandro Amirante, Simon Pietro, Romano

PDF

Open Access

TL;DR

Whispy is a system that adapts Whisper models for real-time speech transcription, achieving low latency and high accuracy through architectural optimizations, enabling practical live speech analysis applications.

Contribution

This paper introduces Whispy, a novel system that enables real-time transcription with Whisper models by optimizing architecture for low latency and high accuracy.

Findings

01

Whispy maintains high transcription accuracy in real-time settings.

02

The system demonstrates robustness across diverse speech datasets.

03

Whispy achieves low computational cost suitable for practical deployment.

Abstract

Large general-purpose transformer models have recently become the mainstay in the realm of speech analysis. In particular, Whisper achieves state-of-the-art results in relevant tasks such as speech recognition, translation, language identification, and voice activity detection. However, Whisper models are not designed to be used in real-time conditions, and this limitation makes them unsuitable for a vast plethora of practical applications. In this paper, we introduce Whispy, a system intended to bring live capabilities to the Whisper pretrained models. As a result of a number of architectural optimisations, Whispy is able to consume live audio streams and generate high level, coherent voice transcriptions, while still maintaining a low computational cost. We evaluate the performance of our system on a large repository of publicly available speech datasets, investigating how the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics in Business and Education