Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights
Zijie Zeng, Shiqi Liu, Lele Sha, Zhuang Li, Kaixun Yang, Sannyuya Liu,, Dragan Ga\v{s}evi\'c, Guanliang Chen

TL;DR
This paper investigates the challenges of detecting AI-generated sentences within human-AI hybrid texts using realistic datasets, proposing a segmentation-based approach and highlighting key difficulties in authorship identification.
Contribution
It introduces a segmentation-based pipeline for detecting AI-generated sentences in realistic hybrid texts and provides empirical insights into the challenges involved.
Findings
Detecting AI-generated sentences is challenging due to editing and stylistic variability.
Frequent authorship changes within texts complicate segmentation.
Segment length influences the choice of detection strategy.
Abstract
This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts. Existing studies of AI-generated text detection for hybrid texts often rely on synthetic datasets. These typically involve hybrid texts with a limited number of boundaries. We contend that studies of detecting AI-generated content within hybrid texts should cover different types of hybrid texts generated in realistic settings to better inform real-world applications. Therefore, our study utilizes the CoAuthor dataset, which includes diverse, realistic hybrid texts generated through the collaboration between human writers and an intelligent writing system in multi-turn interactions. We adopt a two-step, segmentation-based pipeline: (i) detect segments within a given hybrid text where each segment contains sentences of consistent authorship, and (ii) classify the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
