To ChatGPT, or not to ChatGPT: That is the question!
Alessandro Pegoraro, Kavita Kumari, Hossein Fereidooni, Ahmad-Reza, Sadeghi

TL;DR
This study evaluates recent ChatGPT detection methods using a new benchmark dataset and finds that current techniques are ineffective at reliably distinguishing AI-generated text from human writing.
Contribution
It provides a comprehensive assessment of recent detection techniques and introduces a diverse benchmark dataset for evaluating ChatGPT detection performance.
Findings
Existing detection methods are largely ineffective.
A new benchmark dataset was curated for evaluation.
Detection accuracy remains low across methods.
Abstract
ChatGPT has become a global sensation. As ChatGPT and other Large Language Models (LLMs) emerge, concerns of misusing them in various ways increase, such as disseminating fake news, plagiarism, manipulating public opinion, cheating, and fraud. Hence, distinguishing AI-generated from human-generated becomes increasingly essential. Researchers have proposed various detection methodologies, ranging from basic binary classifiers to more complex deep-learning models. Some detection techniques rely on statistical characteristics or syntactic patterns, while others incorporate semantic or contextual information to improve accuracy. The primary objective of this study is to provide a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection. Additionally, we evaluated other AI-generated text detection tools that do not specifically claim to detect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Text Readability and Simplification
MethodsNone
