MAGA-Bench: Machine-Augment-Generated Text via Alignment Detection Benchmark

Anyang Song; Ying Cheng; Yiqian Xu; Rui Feng

arXiv:2601.04633·cs.CL·January 9, 2026

MAGA-Bench: Machine-Augment-Generated Text via Alignment Detection Benchmark

Anyang Song, Ying Cheng, Yiqian Xu, Rui Feng

PDF

Open Access 4 Models 4 Datasets

TL;DR

This paper introduces MAGA-Bench, a benchmark for evaluating and improving the detection of machine-augmented text, by enhancing alignment and robustness of detectors through a new dataset and training pipeline.

Contribution

The paper proposes MAGA, a novel pipeline for generating aligned machine-augmented text, and introduces a dataset that improves detector generalization and robustness against fake text.

Findings

01

RoBERTa detector's AUC improved by 4.60% after training on MAGA dataset.

02

MAGA dataset decreased detector AUC by 8.13%, indicating its challenge for detectors.

03

MAGA pipeline enhances alignment from prompt to reasoning, aiding detector robustness.

Abstract

Large Language Models (LLMs) alignment is constantly evolving. Machine-Generated Text (MGT) is becoming increasingly difficult to distinguish from Human-Written Text (HWT). This has exacerbated abuse issues such as fake news and online fraud. Fine-tuned detectors' generalization ability is highly dependent on dataset quality, and simply expanding the sources of MGT is insufficient. Further augment of generation process is required. According to HC-Var's theory, enhancing the alignment of generated text can not only facilitate attacks on existing detectors to test their robustness, but also help improve the generalization ability of detectors fine-tuned on it. Therefore, we propose \textbf{M}achine-\textbf{A}ugment-\textbf{G}enerated Text via \textbf{A}lignment (MAGA). MAGA's pipeline achieves comprehensive alignment from prompt construction to reasoning process, among which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning