Third ArchEdge Workshop: Exploring the Design Space of Efficient Deep Neural Networks
Fuxun Yu, Dimitrios Stamoulis, Di Wang, Dimitrios Lymberopoulos, Xiang, Chen

TL;DR
This paper explores the design space of efficient deep neural networks by combining static architecture profiling at the GPU core level with dynamic runtime traversal of feature map redundancy, aiming to improve accuracy-latency trade-offs.
Contribution
It introduces a novel full-stack GPU profiling approach for static architecture optimization and a new dynamic method for exploiting feature map redundancy during model execution.
Findings
Full-stack GPU profiling reveals better accuracy-latency trade-offs.
Dynamic feature map traversal improves runtime efficiency.
Highlights open research questions in DNN efficiency.
Abstract
This paper gives an overview of our ongoing work on the design space exploration of efficient deep neural networks (DNNs). Specifically, we cover two aspects: (1) static architecture design efficiency and (2) dynamic model execution efficiency. For static architecture design, different from existing end-to-end hardware modeling assumptions, we conduct full-stack profiling at the GPU core level to identify better accuracy-latency trade-offs for DNN designs. For dynamic model execution, different from prior work that tackles model redundancy at the DNN-channels level, we explore a new dimension of DNN feature map redundancy to be dynamically traversed at runtime. Last, we highlight several open questions that are poised to draw research attention in the next few years.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Parallel Computing and Optimization Techniques
