Identifying the Periodicity of Information in Natural Language
Yulin Ou, Yu Wang, Yang Xu, Hendrik Buschmeier

TL;DR
This paper introduces AutoPeriod of Surprisal (APS), a novel method for detecting periodicity in the information content of natural language, revealing significant patterns and new periodicities in text.
Contribution
The paper presents APS, a new periodicity detection algorithm for surprisal sequences, and demonstrates its effectiveness in uncovering meaningful periodic patterns in language.
Findings
A significant portion of human language shows strong periodicity in information.
New periodicities outside typical structural units are identified and validated.
Periodic patterns result from both structured language factors and longer-range influences.
Abstract
Recent theoretical advancement of information density in natural language has brought the following question on desk: To what degree does natural language exhibit periodicity pattern in its encoded information? We address this question by introducing a new method called AutoPeriod of Surprisal (APS). APS adopts a canonical periodicity detection algorithm and is able to identify any significant periods that exist in the surprisal sequence of a single document. By applying the algorithm to a set of corpora, we have obtained the following interesting results: Firstly, a considerable proportion of human language demonstrates a strong pattern of periodicity in information; Secondly, new periods that are outside the distributions of typical structural units in text (e.g., sentence boundaries, elementary discourse units, etc.) are found and further confirmed via harmonic regression modeling.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
