Balanced Partitioning of Several Cache-Oblivious Algorithms

Yuan Tang; Weiguo Gao

arXiv:2011.01441·cs.DC·November 4, 2020

Balanced Partitioning of Several Cache-Oblivious Algorithms

Yuan Tang, Weiguo Gao

PDF

Open Access

TL;DR

This paper introduces PACO, a novel cache-oblivious, processor-aware partitioning method that achieves perfect strong scaling for several algorithms, including Strassen's, on arbitrary processor counts, improving scalability and cache efficiency.

Contribution

It presents a new partitioning technique, PACO, enabling scalable parallel cache-oblivious algorithms on arbitrary processor counts, including prime numbers, with demonstrated improvements.

Findings

01

PACO achieves near-perfect strong scaling for multiple algorithms.

02

PACO algorithms outperform state-of-the-art in scalability and cache complexity.

03

Preliminary experiments confirm significant performance gains over existing methods.

Abstract

Frigo et al. proposed an ideal cache model and a recursive technique to design sequential cache-efficient algorithms in a cache-oblivious fashion. Ballard et al. pointed out that it is a fundamental open problem to extend the technique to an arbitrary architecture. Ballard et al. raised another open question on how to parallelize Strassen's algorithm exactly and efficiently on an arbitrary number of processors. We propose a novel way of partitioning a cache-oblivious algorithm to achieve perfect strong scaling on an arbitrary number, even a prime number, of processors within a certain range in a shared-memory setting. Our approach is Processor-Aware but Cache-Oblivious (PACO). We demonstrate our approach on several important cache-oblivious algorithms, including LCS, 1D, GAP, classic rectangular matrix multiplication on a semiring, and Strassen's algorithm. We discuss how to extend…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques · Advanced Data Storage Technologies