FuseFlow: A Fusion-Centric Compilation Framework for Sparse Deep Learning on Streaming Dataflow
Rubens Lacouture, Nathan Zhang, Ritvik Sharma, Marco Siracusa, Fredrik Kjolstad, Kunle Olukotun, Olivia Hsu

TL;DR
FuseFlow is a novel compiler that enables comprehensive fusion of sparse operations in deep learning models, optimizing performance on reconfigurable dataflow hardware through advanced fusion strategies and microarchitectural analysis.
Contribution
It introduces the first compiler supporting general cross-expression fusion of sparse operations, along with optimization techniques like parallelization and sparsity blocking.
Findings
Achieves up to 2.7x speedup on GPT-3 with BigBird sparse attention.
Demonstrates that full fusion is not always optimal for sparse models.
Provides a heuristic for pruning suboptimal fusion configurations.
Abstract
As deep learning models scale, sparse computation and specialized dataflow hardware have emerged as powerful solutions to address efficiency. We propose FuseFlow, a compiler that converts sparse machine learning models written in PyTorch to fused sparse dataflow graphs for reconfigurable dataflow architectures (RDAs). FuseFlow is the first compiler to support general cross-expression fusion of sparse operations. In addition to fusion across kernels (expressions), FuseFlow also supports optimizations like parallelization, dataflow ordering, and sparsity blocking. It targets a cycle-accurate dataflow simulator for microarchitectural analysis of fusion strategies. We use FuseFlow for design-space exploration across four real-world machine learning applications with sparsity, showing that full fusion (entire cross-expression fusion across all computation in an end-to-end model) is not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques · Scientific Computing and Data Management
