Identifying Correlation in Stream of Samples
Zhenhao Gu, Hao Zhang

TL;DR
This paper introduces a space-efficient algorithm for detecting correlation between variables in large sample streams, using hash functions and compressed counters to estimate independence with high accuracy.
Contribution
A novel counter matrix algorithm that estimates the $ ext{l}_2$ independence metric efficiently in large spaces, outperforming existing sketching methods in speed and space usage.
Findings
Achieves $1 ext{±} ext{ε}$ multiplicative error with high probability
Uses $ ext{O}( ext{ε}^{-4} ext{log} ext{δ}^{-1})$ space, which is very loose bound
Faster and at least twice as space-efficient compared to state-of-the-art algorithms
Abstract
Identifying independence between two random variables or correlated given their samples has been a fundamental problem in Statistics. However, how to do so in a space-efficient way if the number of states is large is not quite well-studied. We propose a new, simple counter matrix algorithm, which utilize hash functions and a compressed counter matrix to give an unbiased estimate of the independence metric. With (very loose bound) space, we can guarantee multiplicative error with probability at least . We also provide a comparison of our algorithm with the state-of-the-art sketching of sketches algorithm and show that our algorithm is effective, and actually faster and at least 2 times more space-efficient.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlind Source Separation Techniques · Face and Expression Recognition · Random Matrices and Applications
