DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for   Layer Fusion in DNN Accelerators

Sheng-Chun Kao; Xiaoyu Huang; Tushar Krishna

arXiv:2201.11218·cs.LG·June 8, 2022·5 cites

DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for Layer Fusion in DNN Accelerators

Sheng-Chun Kao, Xiaoyu Huang, Tushar Krishna

PDF

Open Access

TL;DR

DNNFuser introduces a novel, inference-based transformer model for layer fusion in DNN accelerators, achieving comparable performance to search-based methods but with significantly higher speed, enabling efficient DNN mapping.

Contribution

It is the first to propose a one-shot, inference-based transformer mapper for inter-layer fusion in DNN accelerators, generalizing solutions for unseen conditions.

Findings

01

Achieves comparable performance to search-based mappers

02

Operates 66x-127x faster during inference

03

Generalizes to unseen layer-fusion scenarios

Abstract

Dataflow/mapping decides the compute and energy efficiency of DNN accelerators. Many mappers have been proposed to tackle the intra-layer map-space. However, mappers for inter-layer map-space (aka layer-fusion map-space), have been rarely discussed. In this work, we propose a mapper, DNNFuser, specifically focusing on this layer-fusion map-space. While existing SOTA DNN mapping explorations rely on search-based mappers, this is the first work, to the best of our knowledge, to propose a one-shot inference-based mapper. We leverage Transformer as our DNN architecture to learn layer-fusion optimization as a sequence modeling problem. Further, the trained DNNFuser can generalize its knowledge and infer new solutions for unseen conditions. Within one inference pass, DNNFuser can infer solutions with compatible performance to the ones found by a highly optimized search-based mapper while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Advanced Neural Network Applications · Particle Detector Development and Performance

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Transformer · Cosine Annealing · Weight Decay · Softmax