Your Large Language Models Are Leaving Fingerprints
Hope McGovern, Rickard Stureborg, Yoshi Suhara, Dimitris, Alikaniotis

TL;DR
This paper demonstrates that simple classifiers using lexical and morphosyntactic features can reliably detect machine-generated text across various datasets and models, revealing persistent fingerprints left by large language models.
Contribution
It shows that LLMs leave detectable linguistic fingerprints that can be visualized and used for robust machine-generated text detection across domains and models.
Findings
Simple classifiers achieve high detection accuracy.
Fingerprints are consistent across models within the same family.
Fine-tuned chat models are easier to detect than standard models.
Abstract
It has been shown that finetuned transformers and other supervised detectors effectively distinguish between human and machine-generated text in some situations arXiv:2305.13242, but we find that even simple classifiers on top of n-gram and part-of-speech features can achieve very robust performance on both in- and out-of-domain data. To understand how this is possible, we analyze machine-generated output text in five datasets, finding that LLMs possess unique fingerprints that manifest as slight differences in the frequency of certain lexical and morphosyntactic features. We show how to visualize such fingerprints, describe how they can be used to detect machine-generated text and find that they are even robust across textual domains. We find that fingerprints are often persistent across models in the same model family (e.g. llama-13b vs. llama-65b) and that models fine-tuned for chat…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
