Chinese Lexical Simplification
Jipeng Qiang, Xinyu Lu, Yun Li, Yunhao Yuan, Yang Shi and, Xindong Wu

TL;DR
This paper introduces the first benchmark dataset and baseline methods for Chinese lexical simplification, aiming to replace complex words with simpler equivalents to improve readability for children and non-native speakers.
Contribution
It presents the first Chinese lexical simplification dataset and evaluates five baseline methods, establishing a foundation for future research in this area.
Findings
Baseline methods vary in effectiveness and suitability.
Pretrained language models show promising results.
Hybrid approaches outperform individual methods.
Abstract
Lexical simplification has attracted much attention in many languages, which is the process of replacing complex words in a given sentence with simpler alternatives of equivalent meaning. Although the richness of vocabulary in Chinese makes the text very difficult to read for children and non-native speakers, there is no research work for Chinese lexical simplification (CLS) task. To circumvent difficulties in acquiring annotations, we manually create the first benchmark dataset for CLS, which can be used for evaluating the lexical simplification systems automatically. In order to acquire more thorough comparison, we present five different types of methods as baselines to generate substitute candidates for the complex word that include synonym-based approach, word embedding-based approach, pretrained language model-based approach, sememe-based approach, and a hybrid approach. Finally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling
