Who Wrote the Book? Detecting and Attributing LLM Ghostwriters
Anudeex Shetty, Qiongkai Xu, Olga Ohrimenko, Jey Han Lau

TL;DR
This paper introduces GhostWriteBench, a large dataset for LLM authorship attribution, and proposes TRACE, an interpretable fingerprinting method that outperforms existing techniques, especially in out-of-distribution scenarios.
Contribution
The paper presents GhostWriteBench dataset and TRACE, a novel lightweight fingerprinting method for LLM attribution that is effective across various models and conditions.
Findings
TRACE achieves state-of-the-art performance.
TRACE remains robust in out-of-distribution settings.
Works well with limited training data.
Abstract
In this paper, we introduce GhostWriteBench, a dataset for LLM authorship attribution. It comprises long-form texts (50K+ words per book) generated by frontier LLMs, and is designed to test generalisation across multiple out-of-distribution (OOD) dimensions, including domain and unseen LLM author. We also propose TRACE -- a novel fingerprinting method that is interpretable and lightweight -- that works for both open- and closed-source models. TRACE creates the fingerprint by capturing token-level transition patterns (e.g., word rank) estimated by another lightweight language model. Experiments on GhostWriteBench demonstrate that TRACE achieves state-of-the-art performance, remains robust in OOD settings, and works well in limited training data scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
