Efficient Parallel Computation of the Estimated Covariance Matrix
Oded Green, Lior David, Ami Galperin, Yitzhak Birk

TL;DR
The paper introduces a novel parallel algorithm for efficiently computing the estimated covariance matrix, achieving high speedups on many-core architectures by avoiding repeated calculations and synchronization.
Contribution
A new parallel algorithm for covariance matrix computation that maximizes core utilization and minimizes synchronization, demonstrated on both many-core and traditional architectures.
Findings
Linear speedup up to 64 cores
~85X speedup on 128-core architecture
20X faster than baseline on quad-core systems
Abstract
Computation of a signal's estimated covariance matrix is an important building block in signal processing, e.g., for spectral estimation. Each matrix element is a sum of products of elements in the input matrix taken over a sliding window. Any given product contributes to multiple output elements, thereby complicating parallelization. We present a novel algorithm that attains very high parallelism without repeating multiplications or requiring inter-core synchronization. Key to this is the assignment to each core of distinct diagonal segments of the output matrix, selected such that no multiplications need to be repeated yet only one core writes to any given output-matrix element, and exploitation of a shared memory (including L1 cache) that obviates the need for a corresponding awkward partitioning of the memory among cores. Implementation on Plurality's HyperCore shared-memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDirection-of-Arrival Estimation Techniques · Sparse and Compressive Sensing Techniques · Advanced Adaptive Filtering Techniques
