Learning to Pronounce Chinese Without a Pronunciation Dictionary
Christopher Chu, Scot Fang, Kevin Knight

TL;DR
This paper presents an unsupervised approach for converting Chinese characters to Mandarin speech, achieving high accuracy without relying on a pronunciation dictionary, surpassing previous methods.
Contribution
It introduces a novel unsupervised method to learn character-to-syllable mappings from non-parallel data, enabling pronunciation without a dictionary.
Findings
Character-to-syllable accuracy of 89%
Outperforms prior work with 22% accuracy
Effective decoding of Chinese writing into speech
Abstract
We demonstrate a program that learns to pronounce Chinese text in Mandarin, without a pronunciation dictionary. From non-parallel streams of Chinese characters and Chinese pinyin syllables, it establishes a many-to-many mapping between characters and pronunciations. Using unsupervised methods, the program effectively deciphers writing into speech. Its token-level character-to-syllable accuracy is 89%, which significantly exceeds the 22% accuracy of prior work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
