HACo-Det: A Study Towards Fine-Grained Machine-Generated Text Detection under Human-AI Coauthoring
Zhixiong Su, Yichen Wang, Herun Wan, Zhaohan Zhang, Minnan Luo

TL;DR
This paper investigates fine-grained detection of machine-generated text in human-AI coauthored content, proposing a new dataset and adapting existing detectors to identify AI contributions at word and sentence levels.
Contribution
It introduces HACo-Det, a dataset for human-AI coauthored texts with word-level labels, and adapts document-level detectors for fine-grained detection, highlighting current challenges and future directions.
Findings
Finetuned models outperform metric-based methods in detection accuracy.
Detection performance is influenced by context window size.
Fine-grained detection remains a challenging problem with room for improvement.
Abstract
The misuse of large language models (LLMs) poses potential risks, motivating the development of machine-generated text (MGT) detection. Existing literature primarily concentrates on binary, document-level detection, thereby neglecting texts that are composed jointly by human and LLM contributions. Hence, this paper explores the possibility of fine-grained MGT detection under human-AI coauthoring. We suggest fine-grained detectors can pave pathways toward coauthored text detection with a numeric AI ratio. Specifically, we propose a dataset, HACo-Det, which produces human-AI coauthored texts via an automatic pipeline with word-level attribution labels. We retrofit seven prevailing document-level detectors to generalize them to word-level detection. Then we evaluate these detectors on HACo-Det on both word- and sentence-level detection tasks. Empirical results show that metric-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
