A Comparison of Two Fluctuation Analyses for Natural Language Clustering   Phenomena: Taylor and Ebeling & Neiman Methods

Kumiko Tanaka-Ishii; Shuntaro Takahashi

arXiv:2009.06257·cs.CL·May 5, 2021

A Comparison of Two Fluctuation Analyses for Natural Language Clustering Phenomena: Taylor and Ebeling & Neiman Methods

Kumiko Tanaka-Ishii, Shuntaro Takahashi

PDF

TL;DR

This paper compares two fluctuation analysis methods, Taylor and Ebeling & Neiman, applied to natural language text, showing both can distinguish real text from random sequences and somewhat differentiate text categories.

Contribution

It clarifies the similarities and differences between the two methods and demonstrates their application to text analysis for the first time.

Findings

01

Both methods distinguish real text from i.i.d. sequences.

02

Taylor exponents can roughly classify text categories.

03

Both methods show potential in capturing script types.

Abstract

This article considers the fluctuation analysis methods of Taylor and Ebeling & Neiman. While both have been applied to various phenomena in the statistical mechanics domain, their similarities and differences have not been clarified. After considering their analytical aspects, this article presents a large-scale application of these methods to text. It is found that both methods can distinguish real text from independently and identically distributed (i.i.d.) sequences. Furthermore, it is found that the Taylor exponents acquired from words can roughly distinguish text categories; this is also the case for Ebeling and Neiman exponents, but to a lesser extent. Additionally, both methods show some possibility of capturing script kinds.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.