Learning to Substitute Words with Model-based Score Ranking

Hongye Liu; Ricardo Henao

arXiv:2502.05933·cs.CL·February 18, 2025

Learning to Substitute Words with Model-based Score Ranking

Hongye Liu, Ricardo Henao

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a model-based approach for word substitution that uses sentence quality scores to improve text without relying on human-labeled data, outperforming existing models.

Contribution

It proposes a novel, annotation-free method for word substitution using model-based scoring and a new loss function to optimize sentence quality.

Findings

01

Outperforms BERT, BART, GPT-4, and LLaMA in substitution quality.

02

Avoids reliance on human annotations, reducing labeling costs.

03

Enhances sentence quality through a novel scoring and ranking approach.

Abstract

Smart word substitution aims to enhance sentence quality by improving word choices; however current benchmarks rely on human-labeled data. Since word choices are inherently subjective, ground-truth word substitutions generated by a small group of annotators are often incomplete and likely not generalizable. To circumvent this issue, we instead employ a model-based score (BARTScore) to quantify sentence quality, thus forgoing the need for human annotations. Specifically, we use this score to define a distribution for each word substitution, allowing one to test whether a substitution is statistically superior relative to others. In addition, we propose a loss function that directly optimizes the alignment between model predictions and sentence scores, while also enhancing the overall quality score of a substitution. Crucially, model learning no longer requires human labels, thus avoiding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hyfred/substitute-words-with-ranking
pytorchOfficial

Videos

Learning to Substitute Words with Model-based Score Ranking· underline

Taxonomy

TopicsBayesian Modeling and Causal Inference · Semantic Web and Ontologies · Advanced Text Analysis Techniques