Online Detection of LLM-Generated Texts via Sequential Hypothesis Testing by Betting

Can Chen; Jun-Kun Wang

arXiv:2410.22318·cs.LG·June 9, 2025

Online Detection of LLM-Generated Texts via Sequential Hypothesis Testing by Betting

Can Chen, Jun-Kun Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces an online detection algorithm for identifying LLM-generated texts using sequential hypothesis testing, enabling rapid and statistically guaranteed decisions in streaming content scenarios.

Contribution

The paper proposes a novel online detection method based on sequential hypothesis testing by betting, with statistical guarantees and applicability to streaming data.

Findings

01

Effective in real-time detection of LLM texts

02

Controlled false positive rate achieved

03

Fast identification of LLM sources

Abstract

Developing algorithms to differentiate between machine-generated texts and human-written texts has garnered substantial attention in recent years. Existing methods in this direction typically concern an offline setting where a dataset containing a mix of real and machine-generated texts is given upfront, and the task is to determine whether each sample in the dataset is from a large language model (LLM) or a human. However, in many practical scenarios, sources such as news websites, social media accounts, and online forums publish content in a streaming fashion. Therefore, in this online scenario, how to quickly and accurately determine whether the source is an LLM with strong statistical guarantees is crucial for these media or platforms to function effectively and prevent the spread of misinformation and other potential misuse of LLMs. To tackle the problem of online detection, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

canchen-cc/online-llm-detection
pytorchOfficial

Videos

Online Detection of LLM-Generated Texts via Sequential Hypothesis Testing by Betting· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Text and Document Classification Technologies · Handwritten Text Recognition Techniques

MethodsSoftmax · Attention Is All You Need