PERT: A New Solution to Pinyin to Character Conversion Task

Jinghui Xiao; Qun Liu; Xin Jiang; Yuanfeng Xiong; Haiteng Wu; Zhe; Zhang

arXiv:2205.11737·cs.CL·May 25, 2022·1 cites

PERT: A New Solution to Pinyin to Character Conversion Task

Jinghui Xiao, Qun Liu, Xin Jiang, Yuanfeng Xiong, Haiteng Wu, Zhe, Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces PERT, a transformer-based model for Pinyin to Character conversion, significantly improving performance over traditional methods and effectively handling out-of-dictionary issues in input method engines.

Contribution

The paper proposes PERT, a novel transformer-based approach for Pinyin to Character conversion, and demonstrates its effectiveness and improvements when combined with n-gram models and external lexicons.

Findings

01

PERT outperforms baseline models in P2C tasks.

02

Combining PERT with n-gram models yields further accuracy gains.

03

Incorporating external lexicons helps address OOD issues.

Abstract

Pinyin to Character conversion (P2C) task is the key task of Input Method Engine (IME) in commercial input software for Asian languages, such as Chinese, Japanese, Thai language and so on. It's usually treated as sequence labelling task and resolved by language model, i.e. n-gram or RNN. However, the low capacity of the n-gram or RNN limits its performance. This paper introduces a new solution named PERT which stands for bidirectional Pinyin Encoder Representations from Transformers. It achieves significant improvement of performance over baselines. Furthermore, we combine PERT with n-gram under a Markov framework, and improve performance further. Lastly, the external lexicon is incorporated into PERT so as to resolve the OOD issue of IME.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huawei-noah/noah-research/tree/master/noahime/PERT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Web Data Mining and Analysis