MAGA-Bench: Machine-Augment-Generated Text via Alignment Detection Benchmark
Anyang Song, Ying Cheng, Yiqian Xu, Rui Feng

TL;DR
This paper introduces MAGA-Bench, a benchmark for evaluating and improving the detection of machine-augmented text, by enhancing alignment and robustness of detectors through a new dataset and training pipeline.
Contribution
The paper proposes MAGA, a novel pipeline for generating aligned machine-augmented text, and introduces a dataset that improves detector generalization and robustness against fake text.
Findings
RoBERTa detector's AUC improved by 4.60% after training on MAGA dataset.
MAGA dataset decreased detector AUC by 8.13%, indicating its challenge for detectors.
MAGA pipeline enhances alignment from prompt to reasoning, aiding detector robustness.
Abstract
Large Language Models (LLMs) alignment is constantly evolving. Machine-Generated Text (MGT) is becoming increasingly difficult to distinguish from Human-Written Text (HWT). This has exacerbated abuse issues such as fake news and online fraud. Fine-tuned detectors' generalization ability is highly dependent on dataset quality, and simply expanding the sources of MGT is insufficient. Further augment of generation process is required. According to HC-Var's theory, enhancing the alignment of generated text can not only facilitate attacks on existing detectors to test their robustness, but also help improve the generalization ability of detectors fine-tuned on it. Therefore, we propose \textbf{M}achine-\textbf{A}ugment-\textbf{G}enerated Text via \textbf{A}lignment (MAGA). MAGA's pipeline achieves comprehensive alignment from prompt construction to reasoning process, among which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning
