A faster subquadratic algorithm for finding outlier correlations
Matti Karppa, Petteri Kaski, Jukka Kohonen

TL;DR
This paper introduces a faster randomized algorithm for detecting outlier pairs of strongly correlated variables among many, improving upon previous methods with subquadratic runtime for Boolean data.
Contribution
The authors develop a novel subquadratic randomized algorithm for outlier correlation detection, extending previous work with improved theoretical runtime bounds and applications.
Findings
Achieves subquadratic runtime for Boolean inputs.
Provides algorithms with bounds depending on matrix multiplication exponents.
Includes applications to light bulb problem and sparse Boolean functions.
Abstract
We study the problem of detecting outlier pairs of strongly correlated variables among a collection of variables with otherwise weak pairwise correlations. After normalization, this task amounts to the geometric task where we are given as input a set of vectors with unit Euclidean norm and dimension , and for some constants , we are asked to find all the outlier pairs of vectors whose inner product is at least in absolute value, subject to the promise that all but at most pairs of vectors have inner product at most in absolute value. Improving on an algorithm of G. Valiant [FOCS 2012; J. ACM 2015], we present a randomized algorithm that for Boolean inputs (-valued data normalized to unit Euclidean length) runs in time \[ \tilde O\bigl(n^{\max\,\{1-\gamma+M(\Delta\gamma,\gamma),\,M(1-\gamma,2\Delta\gamma)\}}+qdn^{2\gamma}\bigr)\,,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
