Fast calculation of entropy with Zhang's estimator

Antoni Lozano; Bernardino Casas; Chris Bentz; Ramon Ferrer-i-Cancho

arXiv:1707.08290·cs.CL·July 27, 2017

Fast calculation of entropy with Zhang's estimator

Antoni Lozano, Bernardino Casas, Chris Bentz, Ramon Ferrer-i-Cancho

PDF

Open Access

TL;DR

This paper introduces a fast and efficient algorithm for estimating entropy using Zhang's estimator, leveraging the smaller number of distinct frequencies in texts, supported by analysis across over 1000 languages.

Contribution

The paper presents a novel, efficient algorithm for entropy estimation that exploits frequency distribution properties, validated through extensive linguistic data analysis.

Findings

01

Algorithm significantly reduces computation time

02

Effective across diverse languages

03

Supports large-scale linguistic entropy analysis

Abstract

Entropy is a fundamental property of a repertoire. Here, we present an efficient algorithm to estimate the entropy of types with the help of Zhang's estimator. The algorithm takes advantage of the fact that the number of different frequencies in a text is in general much smaller than the number of types. We justify the convenience of the algorithm by means of an analysis of the statistical properties of texts from more than 1000 languages. Our work opens up various possibilities for future research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Authorship Attribution and Profiling