DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion
Wei Niu, Jiexiong Guan, Yanzhi Wang, Gagan Agrawal, Bin Ren

TL;DR
DNNFusion is a novel framework that significantly enhances operator fusion in deep neural network execution, leading to substantial speedups and memory reductions suitable for mobile and real-time applications.
Contribution
It introduces a new operator classification and graph rewriting approach to expand fusion opportunities beyond existing pattern-based methods.
Findings
Achieves up to 8.8x more fusion opportunities
Outperforms state-of-the-art frameworks with 9.3x speedup
Reduces memory requirements for DNN inference
Abstract
Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
