Composing Loop-carried Dependence with Other Loops

Kazem Cheshmi; Michelle Mills Strout; Maryam Mehri Dehnavi

arXiv:2111.12238·cs.PL·November 25, 2021

Composing Loop-carried Dependence with Other Loops

Kazem Cheshmi, Michelle Mills Strout, Maryam Mehri Dehnavi

PDF

Open Access

TL;DR

This paper introduces Sparse fusion, a compile-time technique that combines sparse matrix kernels with loop-carried dependencies to improve parallel efficiency, outperforming existing methods significantly.

Contribution

It presents a novel inspection strategy with code transformations for fused sparse kernels, optimizing data locality and load balance.

Findings

01

Sparse fusion outperforms ParSy and MKL by 1.6X and 5.1X respectively.

02

It surpasses LBC and DAGP strategies by 5.1X and 7.2X on average.

03

The approach reduces synchronization overheads and improves cache efficiency.

Abstract

Sparse fusion is a compile-time loop transformation and runtime scheduling implemented as a domain-specific code generator. Sparse fusion generates efficient parallel code for the combination of two sparse matrix kernels where at least one of the kernels has loop-carried dependencies. Available implementations optimize individual sparse kernels. When optimized separately, the irregular dependence patterns of sparse kernels create synchronization overheads and load imbalance, and their irregular memory access patterns result in inefficient cache usage, which reduces parallel efficiency. Sparse fusion uses a novel inspection strategy with code transformations to generate parallel fused code for sparse kernel combinations that is optimized for data locality and load balance. Code generated by Sparse fusion outperforms the existing implementations ParSy and MKL on average 1.6X and 5.1X…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Interconnection Networks and Systems