LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be Detected?
Qihui Zhang, Chujie Gao, Dongping Chen, Yue Huang, Yixin Huang,, Zhenyang Sun, Shilin Zhang, Weiye Li, Zhengyan Fu, Yao Wan, Lichao Sun

TL;DR
This paper investigates the challenge of detecting mixed human and machine-generated text, introduces a new dataset called MixSet, and evaluates existing detectors, revealing their limitations in identifying subtle mixtext scenarios.
Contribution
It defines mixtext, creates the first dataset for it, and assesses current detectors, highlighting the need for more fine-grained detection methods.
Findings
Existing detectors struggle with mixtext detection.
Detectors are less effective on subtle modifications.
Current methods lack robustness and generalization.
Abstract
With the rapid development and widespread application of Large Language Models (LLMs), the use of Machine-Generated Text (MGT) has become increasingly common, bringing with it potential risks, especially in terms of quality and integrity in fields like news, education, and science. Current research mainly focuses on purely MGT detection without adequately addressing mixed scenarios, including AI-revised Human-Written Text (HWT) or human-revised MGT. To tackle this challenge, we define mixtext, a form of mixed text involving both AI and human-generated content. Then, we introduce MixSet, the first dataset dedicated to studying these mixtext scenarios. Leveraging MixSet, we executed comprehensive experiments to assess the efficacy of prevalent MGT detectors in handling mixtext situations, evaluating their performance in terms of effectiveness, robustness, and generalization. Our findings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadioactive Decay and Measurement Techniques
MethodsMixText
