Hybrid phonetic-neural model for correction in speech recognition   systems

Rafael Viana-C\'amara; Mario Campos-Soberanis; Diego Campos-Sobrino

arXiv:2102.06744·eess.AS·February 16, 2021

Hybrid phonetic-neural model for correction in speech recognition systems

Rafael Viana-C\'amara, Mario Campos-Soberanis, Diego Campos-Sobrino

PDF

Open Access 1 Repo

TL;DR

This paper presents a hybrid approach combining phonetic correction and deep neural networks to improve speech recognition accuracy in domain-specific applications, demonstrating reduced word error rates.

Contribution

It introduces a novel combination of phonetic correction algorithms with deep learning models for post-processing in speech recognition systems.

Findings

01

Reduced word error rate in transcriptions

02

Deep learning enhances phonetic correction effectiveness

03

Viability of hybrid models for domain-specific ASR

Abstract

Automatic speech recognition (ASR) is a relevant area in multiple settings because it provides a natural communication mechanism between applications and users. ASRs often fail in environments that use language specific to particular application domains. Some strategies have been explored to reduce errors in closed ASRs through post-processing, particularly automatic spell checking, and deep learning approaches. In this article, we explore using a deep neural network to refine the results of a phonetic correction algorithm applied to a telesales audio database. The results exhibit a reduction in the word error rate (WER), both in the original transcription and in the phonetic correction, which shows the viability of deep learning models together with post-processing correction strategies to reduce errors made by closed ASRs in specific language domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MaxSob/ASRNeuralClassification
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling