Whose LLM is it Anyway? Linguistic Comparison and LLM Attribution for GPT-3.5, GPT-4 and Bard

Ariel Rosenfeld; Teddy Lazebnik

arXiv:2402.14533·cs.CL·September 3, 2025·5 cites

Whose LLM is it Anyway? Linguistic Comparison and LLM Attribution for GPT-3.5, GPT-4 and Bard

Ariel Rosenfeld, Teddy Lazebnik

PDF

Open Access

TL;DR

This paper investigates whether different large language models have distinctive linguistic styles and demonstrates that texts can be attributed to their originating LLM with high accuracy based on linguistic features.

Contribution

It provides a comprehensive linguistic comparison of GPT-3.5, GPT-4, and Bard, and shows that their texts can be reliably attributed to the source model.

Findings

01

Significant linguistic variations among LLMs.

02

Achieved 88% accuracy in attribution using simple classifiers.

03

Discussed implications for LLM identification and authorship attribution.

Abstract

Large Language Models (LLMs) are capable of generating text that is similar to or surpasses human quality. However, it is unclear whether LLMs tend to exhibit distinctive linguistic styles akin to how human authors do. Through a comprehensive linguistic analysis, we compare the vocabulary, Part-Of-Speech (POS) distribution, dependency distribution, and sentiment of texts generated by three of the most popular LLMS today (GPT-3.5, GPT-4, and Bard) to diverse inputs. The results point to significant linguistic variations which, in turn, enable us to attribute a given text to its LLM origin with a favorable 88\% accuracy using a simple off-the-shelf classification model. Theoretical and practical implications of this intriguing finding are discussed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsLinear Layer · Dropout · Dense Connections · Label Smoothing · Adam · Attention Is All You Need · Softmax · Multi-Head Attention · Layer Normalization · Residual Connection