FADiff: Fusion-Aware Differentiable Optimization for DNN Scheduling on Tensor Accelerators
Shuao Jia, Zichao Ling, Chen Bai, Kang Zhao, Jianwang Zhai

TL;DR
FADiff is a gradient-based framework that automatically optimizes intra-layer mapping and inter-layer fusion strategies for DNN deployment on tensor accelerators, significantly improving energy efficiency and latency.
Contribution
This work introduces a differentiable cost model and a gradient-based optimization method for joint DNN mapping and fusion, advancing automated hardware-aware DNN deployment.
Findings
Outperforms existing methods in energy efficiency
Reduces inference latency significantly
Effectively explores complex design space
Abstract
Efficient deployment of Deep Neural Networks (DNNs), such as Large Language Models (LLMs), on tensor accelerators is essential for maximizing computational efficiency in modern AI systems. However, achieving this is challenging due to the enormous and complex design space created by the interaction of intra-layer mapping and inter-layer fusion. In this work, we present FADiff, a gradient-based optimization framework capable of automatically identifying high-quality intra-layer mapping and inter-layer fusion strategies to accelerate inference for DNN workloads. We first construct a unified and differentiable analytical cost model, which accurately predicts the energy and latency of both single-layer mappings and various layer fusion strategies. Then, by encoding discrete constraints into the loss function, we employ a gradient-based approach to efficiently explore the vast design space,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Tensor decomposition and applications · Generative Adversarial Networks and Image Synthesis
