Hybrid phonetic-neural model for correction in speech recognition systems
Rafael Viana-C\'amara, Mario Campos-Soberanis, Diego Campos-Sobrino

TL;DR
This paper presents a hybrid approach combining phonetic correction and deep neural networks to improve speech recognition accuracy in domain-specific applications, demonstrating reduced word error rates.
Contribution
It introduces a novel combination of phonetic correction algorithms with deep learning models for post-processing in speech recognition systems.
Findings
Reduced word error rate in transcriptions
Deep learning enhances phonetic correction effectiveness
Viability of hybrid models for domain-specific ASR
Abstract
Automatic speech recognition (ASR) is a relevant area in multiple settings because it provides a natural communication mechanism between applications and users. ASRs often fail in environments that use language specific to particular application domains. Some strategies have been explored to reduce errors in closed ASRs through post-processing, particularly automatic spell checking, and deep learning approaches. In this article, we explore using a deep neural network to refine the results of a phonetic correction algorithm applied to a telesales audio database. The results exhibit a reduction in the word error rate (WER), both in the original transcription and in the phonetic correction, which shows the viability of deep learning models together with post-processing correction strategies to reduce errors made by closed ASRs in specific language domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling
