Streaming and Sublinear Approximation of Entropy and Information Distances
Sudipto Guha, Andrew McGregor, Suresh Venkatasubramanian

TL;DR
This paper advances the understanding of estimating information-theoretic distances and entropy in data distributions, providing optimal algorithms and tight bounds for property testing and streaming models under sublinear resource constraints.
Contribution
It introduces tight bounds and optimal algorithms for estimating f-divergences and entropy in sublinear time and space models, resolving open questions in distribution property testing.
Findings
Optimal algorithms for Jensen-Shannon divergence and Hellinger distance estimation.
Tight bounds for entropy estimation in property testing.
First polylogarithmic space algorithm for entropy approximation in streaming.
Abstract
In many problems in data mining and machine learning, data items that need to be clustered or classified are not points in a high-dimensional space, but are distributions (points on a high dimensional simplex). For distributions, natural measures of distance are not the norms and variants, but information-theoretic measures like the Kullback-Leibler distance, the Hellinger distance, and others. Efficient estimation of these distances is a key component in algorithms for manipulating distributions. Thus, sublinear resource constraints, either in time (property testing) or space (streaming) are crucial. We start by resolving two open questions regarding property testing of distributions. Firstly, we show a tight bound for estimating bounded, symmetric f-divergences between distributions in a general property testing (sublinear time) framework (the so-called combined oracle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
