The Statistical Signature of LLMs

Ortal Hadad; Edoardo Loru; Jacopo Nudo; Niccol\`o Di Marco; Matteo Cinelli; Walter Quattrociocchi

arXiv:2602.18152·cs.CL·February 23, 2026

The Statistical Signature of LLMs

Ortal Hadad, Edoardo Loru, Jacopo Nudo, Niccol\`o Di Marco, Matteo Cinelli, Walter Quattrociocchi

PDF

Open Access

TL;DR

This paper demonstrates that lossless compression can serve as a model-agnostic measure to identify structural signatures of probabilistic language generation by large language models across various contexts.

Contribution

It introduces a novel, surface-text-based framework using compression to quantify how LLMs alter the statistical structure of language across different environments.

Findings

01

LLMs produce more regular and compressible text than humans.

02

Compression reveals a persistent structural signature of probabilistic generation.

03

The signature's scale dependence limits surface-level distinguishability at small scales.

Abstract

Large language models generate text through probabilistic sampling from high-dimensional distributions, yet how this process reshapes the structural statistical organization of language remains incompletely characterized. Here we show that lossless compression provides a simple, model-agnostic measure of statistical regularity that differentiates generative regimes directly from surface text. We analyze compression behavior across three progressively more complex information ecosystems: controlled human-LLM continuations, generative mediation of a knowledge infrastructure (Wikipedia vs. Grokipedia), and fully synthetic social interaction environments (Moltbook vs. Reddit). Across settings, compression reveals a persistent structural signature of probabilistic generation. In controlled and mediated contexts, LLM-produced language exhibits higher structural regularity and compressibility…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Text Readability and Simplification · Authorship Attribution and Profiling