Automatic Text Pronunciation Correlation Generation and Application for Contextual Biasing
Gaofeng Cheng, Haitian Lu, Chengxu Yang, Xuyang Wang, Ta Li, Yonghong, Yan

TL;DR
This paper introduces a data-driven method called ATPC to automatically generate pronunciation correlations from speech-text data, improving speech recognition performance without manual lexicons.
Contribution
The paper presents a novel automatic approach for extracting pronunciation correlations using speech embeddings, reducing reliance on manual lexicons and enhancing E2E-ASR in contextual biasing.
Findings
ATPC improves Mandarin E2E-ASR accuracy in contextual biasing.
The method is effective for dialects or languages without manual lexicons.
Speech embedding comparison effectively captures pronunciation correlations.
Abstract
Effectively distinguishing the pronunciation correlations between different written texts is a significant issue in linguistic acoustics. Traditionally, such pronunciation correlations are obtained through manually designed pronunciation lexicons. In this paper, we propose a data-driven method to automatically acquire these pronunciation correlations, called automatic text pronunciation correlation (ATPC). The supervision required for this method is consistent with the supervision needed for training end-to-end automatic speech recognition (E2E-ASR) systems, i.e., speech and corresponding text annotations. First, the iteratively-trained timestamp estimator (ITSE) algorithm is employed to align the speech with their corresponding annotated text symbols. Then, a speech encoder is used to convert the speech into speech embeddings. Finally, we compare the speech embeddings distances of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems
MethodsALIGN
