Stochastic model for phonemes uncovers an author-dependency of their usage
Weibing Deng, Armen E. Allahverdyan

TL;DR
This paper models phoneme rank-frequency relations using the Dirichlet distribution, revealing that phoneme usage patterns depend on the author, unlike word frequency relations which follow Zipf's law and are author-independent.
Contribution
It introduces a stochastic Dirichlet-based model for phonemes and demonstrates that phoneme usage patterns are author-dependent, contrasting with word frequency universality.
Findings
Phoneme rank-frequency relations follow a Dirichlet distribution.
Author-dependency of phoneme usage is confirmed by multiple methods.
Word frequency relations are author and text independent, following Zipf's law.
Abstract
We study rank-frequency relations for phonemes, the minimal units that still relate to linguistic meaning. We show that these relations can be described by the Dirichlet distribution, a direct analogue of the ideal-gas model in statistical mechanics. This description allows us to demonstrate that the rank-frequency relations for phonemes of a text do depend on its author. The author-dependency effect is not caused by the author's vocabulary (common words used in different texts), and is confirmed by several alternative means. This suggests that it can be directly related to phonemes. These features contrast to rank-frequency relations for words, which are both author and text independent and are governed by the Zipf's law.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
