
TL;DR
This paper introduces 'dependent component analysis', showing that with three independent noisy copies of data, one can accurately infer the true underlying distribution, extending concepts similar to independent component analysis.
Contribution
The paper demonstrates that three independent noisy copies are sufficient to recover the true distribution, introducing a new approach called dependent component analysis.
Findings
Three copies suffice for maximum precision in distribution estimation.
Invertibility is activated through multiple parallel data uses.
Generalizations to different alphabet sizes are provided.
Abstract
This work is motivated by a question at the heart of unsupervised learning approaches: Assume we are collecting a number K of (subjective) opinions about some event E from K different agents. Can we infer E from them? Prima facie this seems impossible, since the agents may be lying. We model this task by letting the events be distributed according to some distribution p and the task is to estimate p under unknown noise. Again, this is impossible without additional assumptions. We report here the finding of very natural such assumptions - the availability of multiple copies of the true data, each under independent and invertible (in the sense of matrices) noise, is already sufficient: If the true distribution and the observations are modeled on the same finite alphabet, then the number of such copies needed to determine p to the highest possible precision is exactly three! This result…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
