# Is Natural Language a Perigraphic Process? The Theorem about Facts and   Words Revisited

**Authors:** {\L}ukasz D\k{e}bowski

arXiv: 1706.04432 · 2020-03-11

## TL;DR

This paper explores the properties of natural language as a stochastic process, introducing the concept of perigraphic processes where the number of inferred facts grows polynomially with text length, and presents empirical evidence supporting this view.

## Contribution

It introduces the concept of perigraphic processes and the theorem about facts and words, linking the growth of inferred facts to word-like string counts in natural language.

## Key findings

- Number of inferred facts grows as a power of text length in certain processes.
- Empirical data shows natural language exhibits a stepwise power law in word-like strings.
- Natural language is likely non-Markov and perigraphic, unlike simpler stochastic models.

## Abstract

As we discuss, a stationary stochastic process is nonergodic when a random persistent topic can be detected in the infinite random text sampled from the process, whereas we call the process strongly nonergodic when an infinite sequence of independent random bits, called probabilistic facts, is needed to describe this topic completely. Replacing probabilistic facts with an algorithmically random sequence of bits, called algorithmic facts, we adapt this property back to ergodic processes. Subsequently, we call a process perigraphic if the number of algorithmic facts which can be inferred from a finite text sampled from the process grows like a power of the text length. We present a simple example of such a process. Moreover, we demonstrate an assertion which we call the theorem about facts and words. This proposition states that the number of probabilistic or algorithmic facts which can be inferred from a text drawn from a process must be roughly smaller than the number of distinct word-like strings detected in this text by means of the PPM compression algorithm. We also observe that the number of the word-like strings for a sample of plays by Shakespeare follows an empirical stepwise power law, in a stark contrast to Markov processes. Hence we suppose that natural language considered as a process is not only non-Markov but also perigraphic.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.04432/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1706.04432/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/1706.04432/full.md

---
Source: https://tomesphere.com/paper/1706.04432