Mitigating Paraphrase Attacks on Machine-Text Detectors via Paraphrase   Inversion

Rafael Rivera Soto; Barry Chen; Nicholas Andrews

arXiv:2410.21637·cs.CL·March 21, 2025

Mitigating Paraphrase Attacks on Machine-Text Detectors via Paraphrase Inversion

Rafael Rivera Soto, Barry Chen, Nicholas Andrews

PDF

Open Access

TL;DR

This paper introduces a novel paraphrase inversion method to recover original texts from paraphrased versions, significantly improving machine-text detector robustness against paraphrasing attacks across multiple domains.

Contribution

The paper proposes a translation-based approach for paraphrase inversion, demonstrating its effectiveness and generalization to unseen paraphrasing models, enhancing detector performance.

Findings

01

Inversion models improve detector AUROC by +22% on average.

02

Models generalize well to unseen paraphrasing techniques.

03

Effective defense against paraphrasing attacks across domains.

Abstract

High-quality paraphrases are easy to produce using instruction-tuned language models or specialized paraphrasing models. Although this capability has a variety of benign applications, paraphrasing attacks $\unicode x 2013$ paraphrases applied to machine-generated texts $\unicode x 2013$ are known to significantly degrade the performance of machine-text detectors. This motivates us to consider the novel problem of paraphrase inversion, where, given paraphrased text, the objective is to recover an approximation of the original text. The closer the approximation is to the original text, the better machine-text detectors will perform. We propose an approach which frames the problem as translation from paraphrased text back to the original text, which requires examples of texts and corresponding paraphrases to train the inversion model. Fortunately, such training data can easily be generated,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques