JSPG: Dynamic Dictionary Filtering via Joint Semantic-Pinyin-Glyph Retrieval for Chinese Contextual ASR
Shilin Zhou, Zhenghua Li

TL;DR
This paper introduces JSPG, a joint semantic-pinyin-glyph retrieval framework that enhances Chinese contextual ASR by effectively filtering large keyword dictionaries, especially in homophone-rich scenarios.
Contribution
JSPG uniquely combines semantic, pinyin, and glyph features with an extended similarity algorithm to improve filtering accuracy in Chinese ASR, outperforming single-feature methods.
Findings
JSPG significantly outperforms baseline models on Aishell-1 and RWCS-NER datasets.
Guided by JSPG, downstream ASR models show substantial improvements in keyword recognition.
The extended Smith-Waterman algorithm effectively bridges sequence-level and character-level similarity metrics.
Abstract
Contextual Automatic Speech Recognition (ASR) faces challenges with large-scale keyword dictionaries, as excessive irrelevant candidates introduce noise that degrades accuracy. To address this, dynamic filtering typically uses a base ASR model to generate preliminary hypotheses, followed by semantic text retrievers to fetch a concise subset of relevant keywords. However, this approach frequently fails in Chinese ASR. Base models often produce homophonic or near-homophonic errors that preserve the phonetic cues of the target keywords but severely distort their semantic meaning, rendering standard semantic retrievers ineffective. To resolve this, we propose a filtering framework that jointly integrates Semantic, Pinyin, and Glyph features (JSPG). Pinyin effectively retrieves targets based on phonetic similarity, while glyph provides complementary structural cues to filter out numerous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
