On the Generalization of Training-based ChatGPT Detection Methods
Han Xu, Jie Ren, Pengfei He, Shenglai Zeng, Yingqian Cui, Amy Liu, Hui, Liu, Jiliang Tang

TL;DR
This paper investigates how well training-based ChatGPT detection methods generalize across different tasks, prompts, and topics, revealing their limitations and providing insights for future improvements.
Contribution
It offers a comprehensive analysis of the generalization behaviors of ChatGPT detection methods under distribution shifts, supported by a new dataset and extensive experiments.
Findings
Detection models struggle with unseen prompts and topics.
Distribution shifts significantly reduce detection accuracy.
Insights guide future development of more robust detection methods.
Abstract
ChatGPT is one of the most popular language models which achieve amazing performance on various natural language tasks. Consequently, there is also an urgent need to detect the texts generated ChatGPT from human written. One of the extensively studied methods trains classification models to distinguish both. However, existing studies also demonstrate that the trained models may suffer from distribution shifts (during test), i.e., they are ineffective to predict the generated texts from unseen language tasks or topics. In this work, we aim to have a comprehensive investigation on these methods' generalization behaviors under distribution shift caused by a wide range of factors, including prompts, text lengths, topics, and language tasks. To achieve this goal, we first collect a new dataset with human and ChatGPT texts, and then we conduct extensive studies on the collected dataset. Our…
Peer Reviews
Decision·Submitted to ICLR 2024
- **Comprehensive Investigation**: The paper conducts a thorough analysis of the generalization behaviors of existing methods under distribution shifts caused by various factors like prompts, text lengths, topics, and language tasks. - **New Dataset**: The authors contribute to the field by collecting a new dataset containing both human and ChatGPT-generated texts, facilitating in-depth studies on the detection methods. - **Insightful Findings**: The research uncovers insightful findings, prov
- The authors used three prompts in Figure 1; but in general, users might use various prompts. This is not aligned with real user usage, - The experiments are limited to CHATGPT, we do not know whether these conclusions still hold in GPT4 or other open-source LLMs. - Most findinds seem obvious.
* The authors present a novel data set. * The analysis is detailed and comprehensive. * They provide insights on the data collection and domain adaption strategy.
* ChatGPT Direction seems to be not well motivated. It needs a why, not just a what and a how. * This work exclusively discusses the train-based methods, which are smaller in scope.
This work studies an important and timely problem for detecting LLM-generated content. Experiments have conducted for in-distributed settings as well as OOD settings involving content length shift and topic/domain shift. The feature attribution analysis is an interesting and novel angle of study in the context of LLM-generated content detection.
The study seeks for detecting content generated by ChatGPT, which is just an interface where the backend model keeps evolving. Hence, it is hard to say if the experimental results and analysis are reproducible and sustainable. In my opinion, this type of study should be conducted for a static LLM. Length shift and domain/topic shift represent limited types of distribution shift that is easily detectable. The authors could have considered more implicit shift where content is paraphrased with sy
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Artificial Intelligence in Healthcare and Education
