GPT-generated Text Detection: Benchmark Dataset and Tensor-based Detection Method
Zubair Qazi, William Shiao, and Evangelos E. Papalexakis

TL;DR
This paper introduces GRiD, a new dataset for detecting GPT-generated text, and proposes GpTen, a tensor-based semi-supervised detection method that performs comparably to fully-supervised models.
Contribution
The paper provides a novel dataset for GPT text detection and introduces GpTen, a semi-supervised tensor-based method that advances detection capabilities.
Findings
GpTen performs on par with fully-supervised baselines.
The GRiD dataset effectively captures linguistic diversity and response quality.
Benchmark results demonstrate the utility of the dataset and method.
Abstract
As natural language models like ChatGPT become increasingly prevalent in applications and services, the need for robust and accurate methods to detect their output is of paramount importance. In this paper, we present GPT Reddit Dataset (GRiD), a novel Generative Pretrained Transformer (GPT)-generated text detection dataset designed to assess the performance of detection models in identifying generated responses from ChatGPT. The dataset consists of a diverse collection of context-prompt pairs based on Reddit, with human-generated and ChatGPT-generated responses. We provide an analysis of the dataset's characteristics, including linguistic diversity, context complexity, and response quality. To showcase the dataset's utility, we benchmark several detection methods on it, demonstrating their efficacy in distinguishing between human and ChatGPT-generated responses. This dataset serves as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Softmax · Residual Connection · Weight Decay · Linear Layer · Dense Connections · Label Smoothing
