Phonetically-Augmented Discriminative Rescoring for Voice Search Error Correction

Christophe Van Gysel; Maggie Wu; Lyan Verwimp; Caglar Tirkaz; Marco Bertola; Zhihong Lei; Youssef Oualil

arXiv:2506.06117·cs.CL·June 9, 2025

Phonetically-Augmented Discriminative Rescoring for Voice Search Error Correction

Christophe Van Gysel, Maggie Wu, Lyan Verwimp, Caglar Tirkaz, Marco Bertola, Zhihong Lei, Youssef Oualil

PDF

Open Access

TL;DR

This paper introduces a phonetic correction system that enhances voice search ASR accuracy by generating phonetic alternatives and rescoring, significantly reducing word error rates for movie titles.

Contribution

It presents a novel phonetic rescoring method that improves recognition of infrequent words in voice search applications, addressing data scarcity issues in end-to-end ASR models.

Findings

01

Word error rate reduced by up to 7.6%

02

Improved recognition of rare movie titles

03

Effective phonetic rescoring enhances ASR performance

Abstract

End-to-end (E2E) Automatic Speech Recognition (ASR) models are trained using paired audio-text samples that are expensive to obtain, since high-quality ground-truth data requires human annotators. Voice search applications, such as digital media players, leverage ASR to allow users to search by voice as opposed to an on-screen keyboard. However, recent or infrequent movie titles may not be sufficiently represented in the E2E ASR system's training data, and hence, may suffer poor recognition. In this paper, we propose a phonetic correction system that consists of (a) a phonetic search based on the ASR model's output that generates phonetic alternatives that may not be considered by the E2E system, and (b) a rescorer component that combines the ASR model recognition and the phonetic alternatives, and select a final system output. We find that our approach improves word error rate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing