DPU: DAG Processing Unit for Irregular Graphs with Precision-Scalable Posit Arithmetic in 28nm
Nimish Shah, Laura Isabel Galindez Olascoaga, Shirui Zhao, Wannes, Meert, Marian Verhelst

TL;DR
The paper introduces DPU, a specialized 28nm processor optimized for irregular DAG workloads, achieving significant speedup and power efficiency improvements over CPUs and GPUs through parallel execution and scalable posit arithmetic.
Contribution
The paper presents DPU, a novel architecture with parallel compute units and scalable posit arithmetic, tailored for efficient irregular DAG processing, outperforming traditional hardware.
Findings
Achieves 5.1× and 20.6× speedup over CPU and GPU.
Operates at 0.23W power budget with high efficiency.
Enables low-power execution of irregular DAG workloads.
Abstract
Computation in several real-world applications like probabilistic machine learning, sparse linear algebra, and robotic navigation, can be modeled as irregular directed acyclic graphs (DAGs). The irregular data dependencies in DAGs pose challenges to parallel execution on general-purpose CPUs and GPUs, resulting in severe under-utilization of the hardware. This paper proposes DPU, a specialized processor designed for the efficient execution of irregular DAGs. The DPU is equipped with parallel compute units that execute different subgraphs of a DAG independently. The compute units can synchronize within a cycle using a hardware-supported synchronization primitive, and communicate via an efficient interconnect to a global banked scratchpad. Furthermore, a precision-scalable posit arithmetic unit is developed to enable application-dependent precision. The DPU is taped-out in 28nm CMOS,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
