Chinese Idiom Paraphrasing
Jipeng Qiang, Yang Li, Chaowei Zhang, Yun Li, Yunhao Yuan, Yi Zhu,, Xindong Wu

TL;DR
This paper introduces Chinese Idiom Paraphrasing (CIP), a novel task to rephrase idiomatic sentences into non-idiomatic ones, enhancing Chinese NLP applications by creating a large dataset and proposing effective baseline methods.
Contribution
The study pioneers the CIP task, constructs a large-scale dataset, and develops new approaches that outperform baselines in idiom paraphrasing.
Findings
Proposed methods outperform baselines on CIP dataset
Established a large-scale CIP dataset with 115,530 sentence pairs
Demonstrated improved NLP task performance using CIP preprocessing
Abstract
Idioms, are a kind of idiomatic expression in Chinese, most of which consist of four Chinese characters. Due to the properties of non-compositionality and metaphorical meaning, Chinese Idioms are hard to be understood by children and non-native speakers. This study proposes a novel task, denoted as Chinese Idiom Paraphrasing (CIP). CIP aims to rephrase idioms-included sentences to non-idiomatic ones under the premise of preserving the original sentence's meaning. Since the sentences without idioms are easier handled by Chinese NLP systems, CIP can be used to pre-process Chinese datasets, thereby facilitating and improving the performance of Chinese NLP tasks, e.g., machine translation system, Chinese idiom cloze, and Chinese idiom embeddings. In this study, CIP task is treated as a special paraphrase generation task. To circumvent difficulties in acquiring annotations, we first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language, Metaphor, and Cognition · Topic Modeling
