Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language
Dami\'an G. Hern\'andez, Dami\'an H. Zanette, In\'es Samengo

TL;DR
This paper introduces an information-theoretical framework to analyze complex statistical dependencies among three variables, with applications to understanding the structure and semantics of written language.
Contribution
It develops new measures for pure triple interactions, bounds for these dependencies, and applies them to analyze the structure of written texts.
Findings
Words are influenced by nearby words and pairs of words.
Identified key semantic words and triplets with high interactions.
Determined mediating words that influence pairwise interactions.
Abstract
We develop the information-theoretical concepts required to study the statistical dependencies among three variables. Some of such dependencies are pure triple interactions, in the sense that they cannot be explained in terms of a combination of pairwise correlations. We derive bounds for triple dependencies, and characterize the shape of the joint probability distribution of three binary variables with high triple interaction. The analysis also allows us to quantify the amount of redundancy in the mutual information between pairs of variables, and to assess whether the information between two variables is or is not mediated by a third variable. These concepts are applied to the analysis of written texts. We find that the probability that a given word is found in a particular location within the text is not only modulated by the presence or absence of other nearby words, but also, on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
