On the Generalization and Adaptation Ability of Machine-Generated Text Detectors in Academic Writing

Yule Liu; Zhiyuan Zhong; Yifan Liao; Zhen Sun; Jingyi Zheng; Jiaheng Wei; Qingyuan Gong; Fenghua Tong; Yang Chen; Yang Zhang; Xinlei He

arXiv:2412.17242·cs.AI·August 5, 2025

On the Generalization and Adaptation Ability of Machine-Generated Text Detectors in Academic Writing

Yule Liu, Zhiyuan Zhong, Yifan Liao, Zhen Sun, Jingyi Zheng, Jiaheng Wei, Qingyuan Gong, Fenghua Tong, Yang Chen, Yang Zhang, Xinlei He

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper investigates the generalization and adaptation capabilities of machine-generated text detectors in academic writing, introducing a large dataset, benchmarking various detectors, and proposing methods for adaptive detection in evolving scenarios.

Contribution

It presents MGT-Acedemic, a large-scale academic writing dataset, benchmarks detector performance across domains, and introduces an adaptive attribution task with multiple techniques.

Findings

01

Detectors face challenges in attribution tasks across domains.

02

Adaptive techniques improve detection performance in limited data scenarios.

03

The study highlights the complexity of generalizing MGT detection in academic contexts.

Abstract

The rising popularity of large language models (LLMs) has raised concerns about machine-generated text (MGT), particularly in academic settings, where issues like plagiarism and misinformation are prevalent. As a result, developing a highly generalizable and adaptable MGT detection system has become an urgent priority. Given that LLMs are most commonly misused in academic writing, this work investigates the generalization and adaptation capabilities of MGT detectors in three key aspects specific to academic writing: First, we construct MGT-Acedemic, a large-scale dataset comprising over 336M tokens and 749K samples. MGT-Acedemic focuses on academic writing, featuring human-written texts (HWTs) and MGTs across STEM, Humanities, and Social Sciences, paired with an extensible code framework for efficient benchmarking. Second, we benchmark the performance of various detectors for binary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Y-L-LIU/MGTBench-2.0
pytorchOfficial

Datasets

AITextDetect/AI_Polish_clean
dataset· 543 dl
543 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies