DPU-v2: Energy-efficient execution of irregular directed acyclic graphs
Nimish Shah, Wannes Meert, and Marian Verhelst

TL;DR
This paper introduces DPU-v2, a specialized processor architecture designed to efficiently execute irregular DAG workloads with improved energy efficiency and significant speedups over existing processors.
Contribution
The work presents a novel hardware-software co-designed architecture optimized for irregular DAGs, including a dedicated compiler and design space exploration for energy-efficient execution.
Findings
Achieves 1.4x speedup over DAG ASIP
Achieves 3.5x speedup over CPU
Achieves 14x speedup over GPU
Abstract
A growing number of applications like probabilistic machine learning, sparse linear algebra, robotic navigation, etc., exhibit irregular data flow computation that can be modeled with directed acyclic graphs (DAGs). The irregularity arises from the seemingly random connections of nodes, which makes the DAG structure unsuitable for vectorization on CPU or GPU. Moreover, the nodes usually represent a small number of arithmetic operations that cannot amortize the overhead of launching tasks/kernels for each node, further posing challenges for parallel execution. To enable energy-efficient execution, this work proposes DAG processing unit (DPU) version 2, a specialized processor architecture optimized for irregular DAGs with static connectivity. It consists of a tree-structured datapath for efficient data reuse, a customized banked register file, and interconnects tuned to support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Graph Theory and Algorithms · Cloud Computing and Resource Management
