TL;DR
Advancements in neural language models have made generated text more human-like, but improvements in decoding strategies mainly fool humans while making detection easier for automated systems, highlighting the need for combined detection methods.
Contribution
This study benchmarks decoding strategies and reveals that optimizing for human fooling increases detectability by automatic detectors, emphasizing the importance of combined detection approaches.
Findings
Decoding improvements mainly fool humans, not detectors.
Automatic detection becomes easier due to statistical abnormalities.
Multi-sentence excerpts can fool experts over 30% of the time.
Abstract
Recent advancements in neural language modelling make it possible to rapidly generate vast amounts of human-sounding text. The capabilities of humans and automatic discriminators to detect machine-generated text have been a large source of research interest, but humans and machines rely on different cues to make their decisions. Here, we perform careful benchmarking and analysis of three popular sampling-based decoding strategies---top-, nucleus sampling, and untruncated random sampling---and show that improvements in decoding methods have primarily optimized for fooling humans. This comes at the expense of introducing statistical abnormalities that make detection easy for automatic systems. We also show that though both human and automatic detector performance improve with longer excerpt length, even multi-sentence excerpts can fool expert human raters over 30% of the time. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
