How well can machine-generated texts be identified and can language models be trained to avoid identification?
Sinclair Schneider, Florian Steuber, Joao A. G. Schneider, Gabi Dreo, Rodosek

TL;DR
This paper evaluates the effectiveness of machine learning classifiers in detecting AI-generated texts and explores how generative models can be trained to evade such detection, revealing significant challenges in distinguishing human from machine texts.
Contribution
It introduces a reinforcement learning method to refine generative models, enabling them to evade current detection classifiers with high success.
Findings
Shallow classifiers achieve 0.6-0.8 accuracy in detection.
Transformer classifiers have over 0.9 accuracy.
Refined models can evade detection with accuracy below 0.15.
Abstract
With the rise of generative pre-trained transformer models such as GPT-3, GPT-NeoX, or OPT, distinguishing human-generated texts from machine-generated ones has become important. We refined five separate language models to generate synthetic tweets, uncovering that shallow learning classification algorithms, like Naive Bayes, achieve detection accuracy between 0.6 and 0.8. Shallow learning classifiers differ from human-based detection, especially when using higher temperature values during text generation, resulting in a lower detection rate. Humans prioritize linguistic acceptability, which tends to be higher at lower temperature values. In contrast, transformer-based classifiers have an accuracy of 0.9 and above. We found that using a reinforcement learning approach to refine our generative models can successfully evade BERT-based classifiers with a detection accuracy of 0.15 or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
MethodsMulti-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Byte Pair Encoding · Cosine Annealing · Weight Decay · Linear Warmup With Cosine Annealing · Linear Layer · Layer Normalization · Softmax
