Identifying Correlation in Stream of Samples

Zhenhao Gu; Hao Zhang

arXiv:2211.10137·cs.DS·November 21, 2022

Identifying Correlation in Stream of Samples

Zhenhao Gu, Hao Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a space-efficient algorithm for detecting correlation between variables in large sample streams, using hash functions and compressed counters to estimate independence with high accuracy.

Contribution

A novel counter matrix algorithm that estimates the $ ext{l}_2$ independence metric efficiently in large spaces, outperforming existing sketching methods in speed and space usage.

Findings

01

Achieves $1 ext{±} ext{ε}$ multiplicative error with high probability

02

Uses $ ext{O}( ext{ε}^{-4} ext{log} ext{δ}^{-1})$ space, which is very loose bound

03

Faster and at least twice as space-efficient compared to state-of-the-art algorithms

Abstract

Identifying independence between two random variables or correlated given their samples has been a fundamental problem in Statistics. However, how to do so in a space-efficient way if the number of states is large is not quite well-studied. We propose a new, simple counter matrix algorithm, which utilize hash functions and a compressed counter matrix to give an unbiased estimate of the $ℓ_{2}$ independence metric. With $O (ϵ^{- 4} lo g δ^{- 1})$ (very loose bound) space, we can guarantee $1 \pm ϵ$ multiplicative error with probability at least $1 - δ$ . We also provide a comparison of our algorithm with the state-of-the-art sketching of sketches algorithm and show that our algorithm is effective, and actually faster and at least 2 times more space-efficient.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GZHoffie/independence-in-streams
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlind Source Separation Techniques · Face and Expression Recognition · Random Matrices and Applications