A Training-free Method for LLM Text Attribution
Tara Radvand, Mojtaba Abdolmaleki, Mohamed Mostagir, Ambuj Tewari

TL;DR
This paper introduces a training-free, zero-shot statistical method to identify whether a piece of text was generated by a specific LLM or an unknown model, with guarantees on error rates and practical validation.
Contribution
It develops a novel zero-shot testing framework for LLM text attribution that does not require training, with theoretical error bounds and robustness analysis.
Findings
Error probabilities decrease exponentially with text length.
Method is effective even with black-box sampling.
Validated through numerical experiments and robustness tests.
Abstract
Verifying the provenance of content is crucial to the functioning of many organizations, e.g., educational institutions, social media platforms, and firms. This problem is becoming increasingly challenging as text generated by Large Language Models (LLMs) becomes almost indistinguishable from human-generated content. In addition, many institutions use in-house LLMs and want to ensure that external, non-sanctioned LLMs do not produce content within their institutions. In this paper, we answer the following question: Given a piece of text, can we identify whether it was produced by a particular LLM, while ensuring a guaranteed low false positive rate? We model LLM text as a sequential stochastic process with complete dependence on history. We then design zero-shot statistical tests to (i) distinguish between text generated by two different known sets of LLMs (non-sanctioned) and …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Topic Modeling
