From Text to Source: Results in Detecting Large Language Model-Generated   Content

Wissam Antoun; Beno\^it Sagot; Djam\'e Seddah

arXiv:2309.13322·cs.CL·March 28, 2024·1 cites

From Text to Source: Results in Detecting Large Language Model-Generated Content

Wissam Antoun, Beno\^it Sagot, Djam\'e Seddah

PDF

Open Access

TL;DR

This paper evaluates the ability of classifiers to detect and attribute text generated by large language models, revealing challenges with larger models and highlighting the potential of watermarking for source identification.

Contribution

It introduces a comprehensive analysis of cross-model detection and attribution, emphasizing the effects of model size, training data, and watermarking techniques on detection performance.

Findings

01

Detection effectiveness decreases with larger models.

02

Training on similar-sized models improves detection for larger models.

03

Watermarking shows promising results in source attribution.

Abstract

The widespread use of Large Language Models (LLMs), celebrated for their ability to generate human-like text, has raised concerns about misinformation and ethical implications. Addressing these concerns necessitates the development of robust methods to detect and attribute text generated by LLMs. This paper investigates "Cross-Model Detection," by evaluating whether a classifier trained to distinguish between source LLM-generated and human-written text can also detect text from a target LLM without further training. The study comprehensively explores various LLM sizes and families, and assesses the impact of conversational fine-tuning techniques, quantization, and watermarking on classifier generalization. The research also explores Model Attribution, encompassing source model identification, model family, and model size classification, in addition to quantization and watermarking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Natural Language Processing Techniques