Autocorrelations Decay in Texts and Applicability Limits of Language   Models

Nikolay Mikhaylovskiy; Ilya Churilov

arXiv:2305.06615·cs.CL·May 12, 2023·1 cites

Autocorrelations Decay in Texts and Applicability Limits of Language Models

Nikolay Mikhaylovskiy, Ilya Churilov

PDF

Open Access

TL;DR

This paper investigates how autocorrelation decay in texts relates to the limitations of language models, revealing that language models may struggle with long texts due to their Markovian nature and autocorrelation decay properties.

Contribution

The study empirically links autocorrelation decay in texts to language model applicability limits, highlighting differences in generated versus literary texts.

Findings

01

Autocorrelations decay in texts follow a power law.

02

Distributional semantics yields consistent decay exponents across languages.

03

Generated texts show different autocorrelation decay patterns from literary texts.

Abstract

We show that the laws of autocorrelations decay in texts are closely related to applicability limits of language models. Using distributional semantics we empirically demonstrate that autocorrelations of words in texts decay according to a power law. We show that distributional semantics provides coherent autocorrelations decay exponents for texts translated to multiple languages. The autocorrelations decay in generated texts is quantitatively and often qualitatively different from the literary texts. We conclude that language models exhibiting Markov behavior, including large autoregressive language models, may have limitations when applied to long texts, whether analysis or generation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Opinion Dynamics and Social Influence · Computational and Text Analysis Methods