Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
Dongyang Dai, Zhiyong Wu, Shiyin Kang, Xixin Wu, Jia Jia, Dan Su, Dong, Yu, Helen Meng

TL;DR
This paper introduces an end-to-end Chinese polyphone disambiguation framework utilizing pre-trained BERT to extract semantic features, significantly improving pronunciation prediction accuracy without preprocessing.
Contribution
It presents a novel end-to-end approach combining BERT and neural classifiers for Chinese polyphone disambiguation, leveraging semantic features for enhanced performance.
Findings
BERT-based features improve disambiguation accuracy
Different classifiers tested show LSTM and Transformer perform well
Contextual information significantly impacts disambiguation results
Abstract
Grapheme-to-phoneme (G2P) conversion serves as an essential component in Chinese Mandarin text-to-speech (TTS) system, where polyphone disambiguation is the core issue. In this paper, we propose an end-to-end framework to predict the pronunciation of a polyphonic character, which accepts sentence containing polyphonic character as input in the form of Chinese character sequence without the necessity of any preprocessing. The proposed method consists of a pre-trained bidirectional encoder representations from Transformers (BERT) model and a neural network (NN) based classifier. The pre-trained BERT model extracts semantic features from a raw Chinese character sequence and the NN based classifier predicts the polyphonic character's pronunciation according to BERT output. In out experiments, we implemented three classifiers, a fully-connected network based classifier, a long short-term…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Natural Language Processing Techniques
MethodsAttention Is All You Need · Tanh Activation · Sigmoid Activation · Byte Pair Encoding · Attention Dropout · Linear Layer · Softmax · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay
