A Comparison of Two Fluctuation Analyses for Natural Language Clustering Phenomena: Taylor and Ebeling & Neiman Methods
Kumiko Tanaka-Ishii, Shuntaro Takahashi

TL;DR
This paper compares two fluctuation analysis methods, Taylor and Ebeling & Neiman, applied to natural language text, showing both can distinguish real text from random sequences and somewhat differentiate text categories.
Contribution
It clarifies the similarities and differences between the two methods and demonstrates their application to text analysis for the first time.
Findings
Both methods distinguish real text from i.i.d. sequences.
Taylor exponents can roughly classify text categories.
Both methods show potential in capturing script types.
Abstract
This article considers the fluctuation analysis methods of Taylor and Ebeling & Neiman. While both have been applied to various phenomena in the statistical mechanics domain, their similarities and differences have not been clarified. After considering their analytical aspects, this article presents a large-scale application of these methods to text. It is found that both methods can distinguish real text from independently and identically distributed (i.i.d.) sequences. Furthermore, it is found that the Taylor exponents acquired from words can roughly distinguish text categories; this is also the case for Ebeling and Neiman exponents, but to a lesser extent. Additionally, both methods show some possibility of capturing script kinds.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
