The Science of Detecting LLM-Generated Texts

Ruixiang Tang; Yu-Neng Chuang; Xia Hu

arXiv:2303.07205·cs.CL·June 6, 2023·50 cites

The Science of Detecting LLM-Generated Texts

Ruixiang Tang, Yu-Neng Chuang, Xia Hu

PDF

Open Access

TL;DR

This paper surveys current techniques for detecting texts generated by large language models, highlighting achievements, challenges, and future research directions to improve detection and regulation.

Contribution

It provides a comprehensive overview of existing detection methods, discusses key challenges, and emphasizes future research needs including evaluation metrics and open-source LLM threats.

Findings

01

Existing detection techniques vary in effectiveness

02

Open-source LLMs pose new detection challenges

03

Need for standardized evaluation metrics

Abstract

The emergence of large language models (LLMs) has resulted in the production of LLM-generated texts that is highly sophisticated and almost indistinguishable from texts written by humans. However, this has also sparked concerns about the potential misuse of such texts, such as spreading misinformation and causing disruptions in the education system. Although many detection approaches have been proposed, a comprehensive understanding of the achievements and challenges is still lacking. This survey aims to provide an overview of existing LLM-generated text detection techniques and enhance the control and regulation of language generation models. Furthermore, we emphasize crucial considerations for future research, including the development of comprehensive evaluation metrics and the threat posed by open-source LLMs, to drive progress in the area of LLM-generated text detection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling