ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution
Xuanming Zhang, Zixun Chen, Zhou Yu

TL;DR
ProLex introduces a new benchmark and models for lexical substitution that focus on generating substitutes of equal or higher language proficiency, aiding language learners in improving their writing skills.
Contribution
The paper presents a novel task, ProLex benchmark, and models that generate contextually appropriate and higher proficiency substitutes, advancing lexical substitution research.
Findings
Best model outperforms ChatGPT by 3.2% in F-score
Model achieves comparable results with GPT-4 on ProLex
ProLex effectively evaluates proficiency-oriented lexical substitution
Abstract
Lexical Substitution discovers appropriate substitutes for a given target word in a context sentence. However, the task fails to consider substitutes that are of equal or higher proficiency than the target, an aspect that could be beneficial for language learners looking to improve their writing. To bridge this gap, we propose a new task, language proficiency-oriented lexical substitution. We also introduce ProLex, a novel benchmark designed to assess systems' ability to generate not only appropriate substitutes but also substitutes that demonstrate better language proficiency. Besides the benchmark, we propose models that can automatically perform the new task. We show that our best model, a Llama2-13B model fine-tuned with task-specific synthetic data, outperforms ChatGPT by an average of 3.2% in F-score and achieves comparable results with GPT-4 on ProLex.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Layer Normalization · Label Smoothing · Residual Connection · Dropout · Linear Layer · Byte Pair Encoding · Adam
