# The role of grammar in transition-probabilities of subsequent words in   English text

**Authors:** Rudolf Hanel, Stefan Thurner

arXiv: 1812.10991 · 2018-12-31

## TL;DR

This paper shows that incorporating grammatical constraints as local re-orderings into a sample-space reducing process effectively explains word transition probabilities and frequencies in English text, outperforming other models.

## Contribution

It introduces a novel model combining SSR processes with grammatical re-ordering to better replicate linguistic structures in text generation.

## Key findings

- The combined model reproduces word frequency distributions accurately.
- It captures transition probabilities more effectively than baseline models.
- The approach explains the structural properties of English text.

## Abstract

Sentence formation is a highly structured, history-dependent, and sample-space reducing (SSR) process. While the first word in a sentence can be chosen from the entire vocabulary, typically, the freedom of choosing subsequent words gets more and more constrained by grammar and context, as the sentence progresses. This sample-space reducing property offers a natural explanation of Zipf's law in word frequencies, however, it fails to capture the structure of the word-to-word transition probability matrices of English text. Here we adopt the view that grammatical constraints (such as subject--predicate--object) locally re-order the word order in sentences that are sampled with a SSR word generation process. We demonstrate that superimposing grammatical structure -- as a local word re-ordering (permutation) process -- on a sample-space reducing process is sufficient to explain both, word frequencies and word-to-word transition probabilities. We compare the quality of the grammatically ordered SSR model in reproducing several test statistics of real texts with other text generation models, such as the Bernoulli model, the Simon model, and the Monkey typewriting model.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.10991/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1812.10991/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/1812.10991/full.md

---
Source: https://tomesphere.com/paper/1812.10991