LLMDet: A Third Party Large Language Models Generated Text Detection   Tool

Kangxi Wu; Liang Pang; Huawei Shen; Xueqi Cheng; Tat-Seng Chua

arXiv:2305.15004·cs.CL·November 6, 2023·2 cites

LLMDet: A Third Party Large Language Models Generated Text Detection Tool

Kangxi Wu, Liang Pang, Huawei Shen, Xueqi Cheng, Tat-Seng Chua

PDF

Open Access 1 Repo

TL;DR

LLMDet is a practical, fast, and extendable detection tool that accurately identifies the specific large language model source of generated texts, addressing limitations of existing methods.

Contribution

The paper introduces LLMDet, a novel detection tool that can source texts from specific LLMs with high accuracy, speed, and extendability, surpassing existing tools.

Findings

01

Achieves 98.54% precision in detection

02

Operates 5 times faster than previous methods

03

Easily extendable to new open-source models

Abstract

Generated texts from large language models (LLMs) are remarkably close to high-quality human-authored text, raising concerns about their potential misuse in spreading false information and academic misconduct. Consequently, there is an urgent need for a highly practical detection tool capable of accurately identifying the source of a given text. However, existing detection tools typically rely on access to LLMs and can only differentiate between machine-generated and human-authored text, failing to meet the requirements of fine-grained tracing, intermediary judgment, and rapid detection. Therefore, we propose LLMDet, a model-specific, secure, efficient, and extendable detection tool, that can source text from specific LLMs, such as GPT-2, OPT, LLaMA, and others. In LLMDet, we record the next-token probabilities of salient n-grams as features to calculate proxy perplexity for each LLM.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

trustedllm/llmdet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Cosine Annealing · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Attention Dropout · Adam · Dense Connections