A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future   Directions

Junchao Wu; Shu Yang; Runzhe Zhan; Yulin Yuan; Derek F. Wong; Lidia S.; Chao

arXiv:2310.14724·cs.CL·April 22, 2024·30 cites

A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions

Junchao Wu, Shu Yang, Runzhe Zhan, Yulin Yuan, Derek F. Wong, Lidia S., Chao

PDF

Open Access 1 Repo 1 Video

TL;DR

This survey reviews recent advances, challenges, and future directions in detecting text generated by large language models, emphasizing the importance for responsible AI and societal impact.

Contribution

It provides a comprehensive overview of detection methods, datasets, challenges, and future research directions in LLM-generated text detection.

Findings

01

Notable progress in watermarking and neural-based detectors

02

Identification of key challenges like out-of-distribution issues

03

Highlighting the need for better evaluation frameworks

Abstract

The powerful ability to understand, follow, and generate complex language emerging from large language models (LLMs) makes LLM-generated text flood many areas of our daily lives at an incredible speed and is widely accepted by humans. As LLMs continue to expand, there is an imperative need to develop detectors that can detect LLM-generated text. This is crucial to mitigate potential misuse of LLMs and safeguard realms like artistic expression and social networks from harmful influence of LLM-generated content. The LLM-generated text detection aims to discern if a piece of text was produced by an LLM, which is essentially a binary classification task. The detector techniques have witnessed notable advancements recently, propelled by innovations in watermarking techniques, statistics-based detectors, neural-base detectors, and human-assisted methods. In this survey, we collate recent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nlp2ct/llm-generated-text-detection
noneOfficial

Videos

A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Authorship Attribution and Profiling

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings