On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?
Mingmeng Geng, Thierry Poibeau

TL;DR
This paper discusses the challenges in defining, detecting, and evaluating LLM-generated text, emphasizing the complexities introduced by human edits, diverse use cases, and the limitations of current benchmarks.
Contribution
It highlights the ambiguities in defining LLM-generated text and critiques existing detection benchmarks, proposing a more nuanced understanding of detector performance.
Findings
Detection is context-dependent and not universally reliable.
Current benchmarks do not capture real-world complexities.
Detector results should be interpreted with caution.
Abstract
With the widespread use of large language models (LLMs), many researchers have turned their attention to detecting text generated by them. However, there is no consistent or precise definition of their target, namely "LLM-generated text". Differences in usage scenarios and the diversity of LLMs further increase the difficulty of detection. What is commonly regarded as the detecting target usually represents only a subset of the text that LLMs can potentially produce. Human edits to LLM outputs, together with the subtle influences that LLMs exert on their users, are blurring the line between LLM-generated and human-written text. Existing benchmarks and evaluation approaches do not adequately address the various conditions in real-world detector applications. Hence, the numerical results of detectors are often misunderstood, and their significance is diminishing. Therefore, detectors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Natural Language Processing Techniques · Text Readability and Simplification
