Chinese Lexical Simplification

Jipeng Qiang; Xinyu Lu; Yun Li; Yunhao Yuan; Yang Shi and; Xindong Wu

arXiv:2010.07048·cs.CL·October 15, 2020

Chinese Lexical Simplification

Jipeng Qiang, Xinyu Lu, Yun Li, Yunhao Yuan, Yang Shi and, Xindong Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces the first benchmark dataset and baseline methods for Chinese lexical simplification, aiming to replace complex words with simpler equivalents to improve readability for children and non-native speakers.

Contribution

It presents the first Chinese lexical simplification dataset and evaluates five baseline methods, establishing a foundation for future research in this area.

Findings

01

Baseline methods vary in effectiveness and suitability.

02

Pretrained language models show promising results.

03

Hybrid approaches outperform individual methods.

Abstract

Lexical simplification has attracted much attention in many languages, which is the process of replacing complex words in a given sentence with simpler alternatives of equivalent meaning. Although the richness of vocabulary in Chinese makes the text very difficult to read for children and non-native speakers, there is no research work for Chinese lexical simplification (CLS) task. To circumvent difficulties in acquiring annotations, we manually create the first benchmark dataset for CLS, which can be used for evaluating the lexical simplification systems automatically. In order to acquire more thorough comparison, we present five different types of methods as baselines to generate substitute candidates for the complex word that include synonym-based approach, word embedding-based approach, pretrained language model-based approach, sememe-based approach, and a hybrid approach. Finally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luxinyu1/Chinese-LS
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling