Fight Poison with Poison: Enhancing Robustness in Few-shot Machine-Generated Text Detection with Adversarial Training

Wenjing Duan; Qi Zhou; Yuanfan Li

arXiv:2605.02374·cs.CR·May 5, 2026

Fight Poison with Poison: Enhancing Robustness in Few-shot Machine-Generated Text Detection with Adversarial Training

Wenjing Duan, Qi Zhou, Yuanfan Li

PDF

TL;DR

This paper introduces REACT, an adversarial training framework that enhances the robustness and accuracy of few-shot machine-generated text detection by co-evolving a humanization-oriented attacker and a detector.

Contribution

The paper presents a novel adversarial training method combining retrieval-augmented generation with contrastive learning to improve detection performance and robustness in few-shot settings.

Findings

01

REACT improves detection F1 by 4.95 points over SOTA detectors.

02

REACT reduces attack success rate by 3.66 percentage points.

03

Experiments on 4 datasets show consistent robustness gains.

Abstract

Machine-generated text (MGT) detection is critical for regulating online information ecosystems, yet existing detectors often underperform in few-shot settings and remain vulnerable to adversarial, humanizing attacks. To build accurate and robust detectors under limited supervision, we adopt a threat-modeling perspective and study detector vulnerabilities from an attacker's viewpoint under an output-only black-box setting. Motivated by this perspective, we propose RAG-GuidEd Attacker Strengthens ConTrastive Few-shot Detector (REACT), an adversarial training framework that improves both few-shot detection performance and robustness against attacks. REACT couples a humanization-oriented attacker with a target detector: the attacker leverages retrieval-augmented generation (RAG) to craft highly human-like adversarial examples to evade detection, while the detector learns from these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.