Distributed-memory $\mathcal{H}$-matrix Algebra I: Data Distribution and Matrix-vector Multiplication
Yingzhou Li, Jack Poulson, Lexing Ying

TL;DR
This paper presents a new data distribution scheme and a distributed-memory algorithm for $ ext{H}$-matrix-vector multiplication that improves scalability and reduces communication costs, enabling efficient large-scale computations.
Contribution
It introduces a novel data distribution scheme that avoids expensive scheduling and a tree-communication algorithm for better parallel efficiency in distributed $ ext{H}$-matrix operations.
Findings
Achieves $O(rac{N ext{log} N}{P} + ext{latency} ext{ and bandwidth terms})$ complexity.
Demonstrates good parallel efficiency on thousands of processes.
Applicable to 2D and 3D problems of various sizes.
Abstract
We introduce a data distribution scheme for -matrices and a distributed-memory algorithm for -matrix-vector multiplication. Our data distribution scheme avoids an expensive scheduling procedure used in previous work, where is the number of processes, while data balancing is well-preserved. Based on the data distribution, our distributed-memory algorithm evenly distributes all computations among processes and adopts a novel tree-communication algorithm to reduce the latency cost. The overall complexity of our algorithm is for -matrices under weak admissibility condition, where is the matrix size, denotes the latency, and denotes the inverse bandwidth. Numerically, our algorithm is applied to address both two- and three-dimensional problems of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Complexity and Algorithms in Graphs · Matrix Theory and Algorithms
