An Optimal Uniform Concentration Inequality for Discrete Entropies on   Finite Alphabets in the High-dimensional Setting

Yunpeng Zhao

arXiv:2007.04547·math.PR·June 23, 2021

An Optimal Uniform Concentration Inequality for Discrete Entropies on Finite Alphabets in the High-dimensional Setting

Yunpeng Zhao

PDF

Open Access

TL;DR

This paper establishes a new optimal uniform concentration inequality for discrete entropies on finite alphabets, improving convergence rates and extending to misspecified models, with applications in information theory.

Contribution

It introduces an optimal uniform concentration inequality for discrete entropies, improving previous bounds and extending results to misspecified models.

Findings

01

Improved convergence rate from $(K^2 ext{log} K)/n$ to $( ext{log} K)^2/n$

02

Proved the rate $( ext{log} K)^2/n=o(1)$ is optimal

03

Extended results to misspecified log-likelihoods for grouped variables

Abstract

We prove an exponential decay concentration inequality to bound the tail probability of the difference between the log-likelihood of discrete random variables on a finite alphabet and the negative entropy. The concentration bound we derive holds uniformly over all parameter values. The new result improves the convergence rate in an earlier result of Zhao (2020), from $(K^{2} lo g K) / n = o (1)$ to $(lo g K)^{2} / n = o (1)$ , where $n$ is the sample size and $K$ is the size of the alphabet. We further prove that the rate $(lo g K)^{2} / n = o (1)$ is optimal. The results are extended to misspecified log-likelihoods for grouped random variables. We give applications of the new result in information theory.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Markov Chains and Monte Carlo Methods