Distribution Estimation with Side Information
Haricharan Balasundaram, Andrew Thangaraj

TL;DR
This paper explores how side information, like word similarities or known probability groupings, can improve discrete distribution estimation from samples, providing theoretical analysis and empirical validation.
Contribution
It introduces two models leveraging side information—local neighborhood and partial ordering—and characterizes their impact on estimation accuracy.
Findings
Side information improves estimation risk bounds.
Theoretical analysis quantifies gains from side information.
Empirical results confirm theoretical improvements.
Abstract
We consider the classical problem of discrete distribution estimation using i.i.d. samples in a novel scenario where additional side information is available on the distribution. In large alphabet datasets such as text corpora, such side information arises naturally through word semantics/similarities that can be inferred by closeness of vector word embeddings, for instance. We consider two specific models for side information--a local model where the unknown distribution is in the neighborhood of a known distribution, and a partial ordering model where the alphabet is partitioned into known higher and lower probability sets. In both models, we theoretically characterize the improvement in a suitable squared-error risk because of the available side information. Simulations over natural language and synthetic data illustrate these gains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
