SeqXGPT: Sentence-Level AI-Generated Text Detection

Pengyu Wang; Linyang Li; Ke Ren; Botian Jiang; Dong Zhang; Xipeng Qiu

arXiv:2310.08903·cs.CL·December 18, 2023·6 cites

SeqXGPT: Sentence-Level AI-Generated Text Detection

Pengyu Wang, Linyang Li, Ke Ren, Botian Jiang, Dong Zhang, Xipeng Qiu

PDF

Open Access 1 Repo 1 Models

TL;DR

SeqXGPT introduces a novel sentence-level AI-generated text detection method using LLM log probabilities, outperforming existing approaches and demonstrating strong generalization in both sentence and document detection tasks.

Contribution

The paper presents the first sentence-level AIGT detection challenge and proposes SeqXGPT, a new model leveraging log probability features with convolution and self-attention, surpassing prior methods.

Findings

01

SeqXGPT significantly outperforms baseline methods in detection accuracy.

02

The method generalizes well across different datasets and detection levels.

03

Sentence-level detection remains challenging for previous approaches.

Abstract

Widely applied large language models (LLMs) can generate human-like content, raising concerns about the abuse of LLMs. Therefore, it is important to build strong AI-generated text (AIGT) detectors. Current works only consider document-level AIGT detection, therefore, in this paper, we first introduce a sentence-level detection challenge by synthesizing a dataset that contains documents that are polished with LLMs, that is, the documents contain sentences written by humans and sentences modified by LLMs. Then we propose \textbf{Seq}uence \textbf{X} (Check) \textbf{GPT}, a novel method that utilizes log probability lists from white-box LLMs as features for sentence-level AIGT detection. These features are composed like \textit{waves} in speech processing and cannot be studied by LLMs. Therefore, we build SeqXGPT based on convolution and self-attention networks. We test it in both sentence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jihuai-wpy/seqxgpt
pytorchOfficial

Models

🤗
zcahjl3/seqxgpt-detector
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Hate Speech and Cyberbullying Detection

MethodsConvolution