Learning to Substitute Words with Model-based Score Ranking
Hongye Liu, Ricardo Henao

TL;DR
This paper introduces a model-based approach for word substitution that uses sentence quality scores to improve text without relying on human-labeled data, outperforming existing models.
Contribution
It proposes a novel, annotation-free method for word substitution using model-based scoring and a new loss function to optimize sentence quality.
Findings
Outperforms BERT, BART, GPT-4, and LLaMA in substitution quality.
Avoids reliance on human annotations, reducing labeling costs.
Enhances sentence quality through a novel scoring and ranking approach.
Abstract
Smart word substitution aims to enhance sentence quality by improving word choices; however current benchmarks rely on human-labeled data. Since word choices are inherently subjective, ground-truth word substitutions generated by a small group of annotators are often incomplete and likely not generalizable. To circumvent this issue, we instead employ a model-based score (BARTScore) to quantify sentence quality, thus forgoing the need for human annotations. Specifically, we use this score to define a distribution for each word substitution, allowing one to test whether a substitution is statistically superior relative to others. In addition, we propose a loss function that directly optimizes the alignment between model predictions and sentence scores, while also enhancing the overall quality score of a substitution. Crucially, model learning no longer requires human labels, thus avoiding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBayesian Modeling and Causal Inference · Semantic Web and Ontologies · Advanced Text Analysis Techniques
