HACo-Det: A Study Towards Fine-Grained Machine-Generated Text Detection under Human-AI Coauthoring

Zhixiong Su; Yichen Wang; Herun Wan; Zhaohan Zhang; Minnan Luo

arXiv:2506.02959·cs.CL·June 4, 2025

HACo-Det: A Study Towards Fine-Grained Machine-Generated Text Detection under Human-AI Coauthoring

Zhixiong Su, Yichen Wang, Herun Wan, Zhaohan Zhang, Minnan Luo

PDF

Open Access 1 Video

TL;DR

This paper investigates fine-grained detection of machine-generated text in human-AI coauthored content, proposing a new dataset and adapting existing detectors to identify AI contributions at word and sentence levels.

Contribution

It introduces HACo-Det, a dataset for human-AI coauthored texts with word-level labels, and adapts document-level detectors for fine-grained detection, highlighting current challenges and future directions.

Findings

01

Finetuned models outperform metric-based methods in detection accuracy.

02

Detection performance is influenced by context window size.

03

Fine-grained detection remains a challenging problem with room for improvement.

Abstract

The misuse of large language models (LLMs) poses potential risks, motivating the development of machine-generated text (MGT) detection. Existing literature primarily concentrates on binary, document-level detection, thereby neglecting texts that are composed jointly by human and LLM contributions. Hence, this paper explores the possibility of fine-grained MGT detection under human-AI coauthoring. We suggest fine-grained detectors can pave pathways toward coauthored text detection with a numeric AI ratio. Specifically, we propose a dataset, HACo-Det, which produces human-AI coauthored texts via an automatic pipeline with word-level attribution labels. We retrofit seven prevailing document-level detectors to generalize them to word-level detection. Then we evaluate these detectors on HACo-Det on both word- and sentence-level detection tasks. Empirical results show that metric-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

HACo-Det: A Study Towards Fine-Grained Machine-Generated Text Detection under Human-AI Coauthoring· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling