On the Generalization of Training-based ChatGPT Detection Methods

Han Xu; Jie Ren; Pengfei He; Shenglai Zeng; Yingqian Cui; Amy Liu; Hui; Liu; Jiliang Tang

arXiv:2310.01307·cs.CL·October 4, 2023

On the Generalization of Training-based ChatGPT Detection Methods

Han Xu, Jie Ren, Pengfei He, Shenglai Zeng, Yingqian Cui, Amy Liu, Hui, Liu, Jiliang Tang

PDF

Open Access 1 Repo 1 Datasets 3 Reviews

TL;DR

This paper investigates how well training-based ChatGPT detection methods generalize across different tasks, prompts, and topics, revealing their limitations and providing insights for future improvements.

Contribution

It offers a comprehensive analysis of the generalization behaviors of ChatGPT detection methods under distribution shifts, supported by a new dataset and extensive experiments.

Findings

01

Detection models struggle with unseen prompts and topics.

02

Distribution shifts significantly reduce detection accuracy.

03

Insights guide future development of more robust detection methods.

Abstract

ChatGPT is one of the most popular language models which achieve amazing performance on various natural language tasks. Consequently, there is also an urgent need to detect the texts generated ChatGPT from human written. One of the extensively studied methods trains classification models to distinguish both. However, existing studies also demonstrate that the trained models may suffer from distribution shifts (during test), i.e., they are ineffective to predict the generated texts from unseen language tasks or topics. In this work, we aim to have a comprehensive investigation on these methods' generalization behaviors under distribution shift caused by a wide range of factors, including prompts, text lengths, topics, and language tasks. To achieve this goal, we first collect a new dataset with human and ChatGPT texts, and then we conduct extensive studies on the collected dataset. Our…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

- **Comprehensive Investigation**: The paper conducts a thorough analysis of the generalization behaviors of existing methods under distribution shifts caused by various factors like prompts, text lengths, topics, and language tasks. - **New Dataset**: The authors contribute to the field by collecting a new dataset containing both human and ChatGPT-generated texts, facilitating in-depth studies on the detection methods. - **Insightful Findings**: The research uncovers insightful findings, prov

Weaknesses

- The authors used three prompts in Figure 1; but in general, users might use various prompts. This is not aligned with real user usage, - The experiments are limited to CHATGPT, we do not know whether these conclusions still hold in GPT4 or other open-source LLMs. - Most findinds seem obvious.

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

* The authors present a novel data set. * The analysis is detailed and comprehensive. * They provide insights on the data collection and domain adaption strategy.

Weaknesses

* ChatGPT Direction seems to be not well motivated. It needs a why, not just a what and a how. * This work exclusively discusses the train-based methods, which are smaller in scope.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 2

Strengths

This work studies an important and timely problem for detecting LLM-generated content. Experiments have conducted for in-distributed settings as well as OOD settings involving content length shift and topic/domain shift. The feature attribution analysis is an interesting and novel angle of study in the context of LLM-generated content detection.

Weaknesses

The study seeks for detecting content generated by ChatGPT, which is just an interface where the backend model keeps evolving. Hence, it is hard to say if the experimental results and analysis are reproducible and sustainable. In my opinion, this type of study should be conducted for a static LLM. Length shift and domain/topic shift represent limited types of distribution shift that is easily detectable. The authors could have considered more implicit shift where content is paraphrased with sy

Code & Models

Repositories

hannxu123/hcvar
pytorchOfficial

Datasets

hannxu/hc_var
dataset· 51 dl
51 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Artificial Intelligence in Healthcare and Education