A Training-free Method for LLM Text Attribution

Tara Radvand; Mojtaba Abdolmaleki; Mohamed Mostagir; Ambuj Tewari

arXiv:2501.02406·stat.ML·March 24, 2026

A Training-free Method for LLM Text Attribution

Tara Radvand, Mojtaba Abdolmaleki, Mohamed Mostagir, Ambuj Tewari

PDF

Open Access 1 Repo

TL;DR

This paper introduces a training-free, zero-shot statistical method to identify whether a piece of text was generated by a specific LLM or an unknown model, with guarantees on error rates and practical validation.

Contribution

It develops a novel zero-shot testing framework for LLM text attribution that does not require training, with theoretical error bounds and robustness analysis.

Findings

01

Error probabilities decrease exponentially with text length.

02

Method is effective even with black-box sampling.

03

Validated through numerical experiments and robustness tests.

Abstract

Verifying the provenance of content is crucial to the functioning of many organizations, e.g., educational institutions, social media platforms, and firms. This problem is becoming increasingly challenging as text generated by Large Language Models (LLMs) becomes almost indistinguishable from human-generated content. In addition, many institutions use in-house LLMs and want to ensure that external, non-sanctioned LLMs do not produce content within their institutions. In this paper, we answer the following question: Given a piece of text, can we identify whether it was produced by a particular LLM, while ensuring a guaranteed low false positive rate? We model LLM text as a sequential stochastic process with complete dependence on history. We then design zero-shot statistical tests to (i) distinguish between text generated by two different known sets of LLMs $A$ (non-sanctioned) and $B$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tararadvand74/llm-text-detection
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Topic Modeling