From Text to Source: Results in Detecting Large Language Model-Generated Content
Wissam Antoun, Beno\^it Sagot, Djam\'e Seddah

TL;DR
This paper evaluates the ability of classifiers to detect and attribute text generated by large language models, revealing challenges with larger models and highlighting the potential of watermarking for source identification.
Contribution
It introduces a comprehensive analysis of cross-model detection and attribution, emphasizing the effects of model size, training data, and watermarking techniques on detection performance.
Findings
Detection effectiveness decreases with larger models.
Training on similar-sized models improves detection for larger models.
Watermarking shows promising results in source attribution.
Abstract
The widespread use of Large Language Models (LLMs), celebrated for their ability to generate human-like text, has raised concerns about misinformation and ethical implications. Addressing these concerns necessitates the development of robust methods to detect and attribute text generated by LLMs. This paper investigates "Cross-Model Detection," by evaluating whether a classifier trained to distinguish between source LLM-generated and human-written text can also detect text from a target LLM without further training. The study comprehensively explores various LLM sizes and families, and assesses the impact of conversational fine-tuning techniques, quantization, and watermarking on classifier generalization. The research also explores Model Attribution, encompassing source model identification, model family, and model size classification, in addition to quantization and watermarking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Natural Language Processing Techniques
