A Bayesian Approach to Harnessing the Power of LLMs in Authorship   Attribution

Zhengmian Hu; Tong Zheng; Heng Huang

arXiv:2410.21716·cs.CL·October 30, 2024

A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution

Zhengmian Hu, Tong Zheng, Heng Huang

PDF

Open Access

TL;DR

This paper demonstrates that pre-trained large language models, combined with Bayesian methods, can effectively perform one-shot authorship attribution with high accuracy, offering a new approach for forensic linguistics.

Contribution

It introduces a Bayesian framework leveraging LLM probability outputs for one-shot authorship attribution, setting new performance benchmarks.

Findings

01

Achieved 85% accuracy on IMDb and blog datasets

02

Validated the approach with extensive ablation studies

03

Set new baselines for LLM-based authorship analysis

Abstract

Authorship attribution aims to identify the origin or author of a document. Traditional approaches have heavily relied on manual features and fail to capture long-range correlations, limiting their effectiveness. Recent advancements leverage text embeddings from pre-trained language models, which require significant fine-tuning on labeled data, posing challenges in data dependency and limited interpretability. Large Language Models (LLMs), with their deep reasoning capabilities and ability to maintain long-range textual associations, offer a promising alternative. This study explores the potential of pre-trained LLMs in one-shot authorship attribution, specifically utilizing Bayesian approaches and probability outputs of LLMs. Our methodology calculates the probability that a text entails previous writings of an author, reflecting a more nuanced understanding of authorship. By utilizing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLibrary Science and Information Systems · Authorship Attribution and Profiling · Digital Rights Management and Security

MethodsSparse Evolutionary Training