Enhancing Pre-trained Language Model with Lexical Simplification

Rongzhou Bao; Jiayi Wang; Zhuosheng Zhang; Hai Zhao

arXiv:2012.15070·cs.CL·January 1, 2021·1 cites

Enhancing Pre-trained Language Model with Lexical Simplification

Rongzhou Bao, Jiayi Wang, Zhuosheng Zhang, Hai Zhao

PDF

Open Access

TL;DR

This paper introduces a novel method that uses lexical simplification to enhance pre-trained language models' accuracy in text classification by reducing lexical complexity.

Contribution

It proposes a rule-based lexical simplification approach that, when integrated with PrLMs like BERT and ELECTRA, improves classification performance.

Findings

01

Improved accuracy on multiple text classification tasks.

02

Effective use of simplified sentences as auxiliary inputs.

03

Enhancement over baseline PrLMs without simplification.

Abstract

For both human readers and pre-trained language models (PrLMs), lexical diversity may lead to confusion and inaccuracy when understanding the underlying semantic meanings of given sentences. By substituting complex words with simple alternatives, lexical simplification (LS) is a recognized method to reduce such lexical diversity, and therefore to improve the understandability of sentences. In this paper, we leverage LS and propose a novel approach which can effectively improve the performance of PrLMs in text classification. A rule-based simplification process is applied to a given sentence. PrLMs are encouraged to predict the real label of the given sentence with auxiliary inputs from the simplified version. Using strong PrLMs (BERT and ELECTRA) as baselines, our approach can still further improve the performance in various text classification tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Topic Modeling · Natural Language Processing Techniques