On Probability Estimation via Relative Frequencies and Discount

Christopher Mattern

arXiv:1311.1723·cs.IT·January 12, 2015·1 cites

On Probability Estimation via Relative Frequencies and Discount

Christopher Mattern

PDF

Open Access

TL;DR

This paper analyzes a probability estimation algorithm based on relative frequencies and discounting, providing theoretical guarantees on its efficiency in data compression, and explaining the empirical recency effect.

Contribution

It introduces Algorithm RFD and offers the first theoretical analysis demonstrating its effectiveness in various probabilistic models.

Findings

01

Code length remains small under piecewise stationary models.

02

Theoretical confirmation of the empirical recency effect.

03

Supports practical use in data compression algorithms.

Abstract

Probability estimation is an elementary building block of every statistical data compression algorithm. In practice probability estimation is often based on relative letter frequencies which get scaled down, when their sum is too large. Such algorithms are attractive in terms of memory requirements, running time and practical performance. However, there still is a lack of theoretical understanding. In this work we formulate a typical probability estimation algorithm based on relative frequencies and frequency discount, Algorithm RFD. Our main contribution is its theoretical analysis. We show that the code length it requires above an arbitrary piecewise stationary model with bounded and unbounded letter probabilities is small. This theoretically confirms the recency effect of periodic frequency discount, which has often been observed empirically.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Advanced Wireless Communication Techniques · Error Correcting Code Techniques