Leveraging Self-Supervised Models for Automatic Whispered Speech   Recognition

Aref Farhadipour; Homa Asadi; Volker Dellwo

arXiv:2407.21211·eess.AS·November 5, 2024

Leveraging Self-Supervised Models for Automatic Whispered Speech Recognition

Aref Farhadipour, Homa Asadi, Volker Dellwo

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel self-supervised WavLM-based approach for recognizing Irish dialect whispered speech, significantly outperforming existing models like OpenAI Whisper in accuracy.

Contribution

It presents a fine-tuned WavLM model specifically adapted for whispered speech recognition in Irish dialects, addressing the scarcity of training data and acoustic challenges.

Findings

01

WavLM-based system achieved WER of 9.22%

02

Significant improvement over OpenAI Whisper baseline

03

Demonstrates effectiveness for dialect-specific whispered speech

Abstract

In automatic speech recognition, any factor that alters the acoustic properties of speech can pose a challenge to the system's performance. This paper presents a novel approach for automatic whispered speech recognition in the Irish dialect using the self-supervised WavLM model. Conventional automatic speech recognition systems often fail to accurately recognise whispered speech due to its distinct acoustic properties and the scarcity of relevant training data. To address this challenge, we utilized a pre-trained WavLM model, fine-tuned with a combination of whispered and normal speech data from the wTIMIT and CHAINS datasets, which include the English language in Singaporean and Irish dialects, respectively. Our baseline evaluation with the OpenAI Whisper model highlighted its limitations, achieving a Word Error Rate (WER) of 18.8% and a Character Error Rate (CER) of 4.24% on whispered…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

areffarhadi/Whisper_fine_tuning_ASR
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis