Composing Loop-carried Dependence with Other Loops
Kazem Cheshmi, Michelle Mills Strout, Maryam Mehri Dehnavi

TL;DR
This paper introduces Sparse fusion, a compile-time technique that combines sparse matrix kernels with loop-carried dependencies to improve parallel efficiency, outperforming existing methods significantly.
Contribution
It presents a novel inspection strategy with code transformations for fused sparse kernels, optimizing data locality and load balance.
Findings
Sparse fusion outperforms ParSy and MKL by 1.6X and 5.1X respectively.
It surpasses LBC and DAGP strategies by 5.1X and 7.2X on average.
The approach reduces synchronization overheads and improves cache efficiency.
Abstract
Sparse fusion is a compile-time loop transformation and runtime scheduling implemented as a domain-specific code generator. Sparse fusion generates efficient parallel code for the combination of two sparse matrix kernels where at least one of the kernels has loop-carried dependencies. Available implementations optimize individual sparse kernels. When optimized separately, the irregular dependence patterns of sparse kernels create synchronization overheads and load imbalance, and their irregular memory access patterns result in inefficient cache usage, which reduces parallel efficiency. Sparse fusion uses a novel inspection strategy with code transformations to generate parallel fused code for sparse kernel combinations that is optimized for data locality and load balance. Code generated by Sparse fusion outperforms the existing implementations ParSy and MKL on average 1.6X and 5.1X…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Interconnection Networks and Systems
