Dissecting the Graphcore IPU Architecture via Microbenchmarking
Zhe Jia, Blake Tillman, Marco Maggioni, Daniele Paolo, Scarpazza

TL;DR
This paper provides an in-depth microbenchmarking analysis of Graphcore's IPU architecture, revealing how its design influences performance and offering models to predict application efficiency.
Contribution
It introduces detailed microbenchmarking methods to analyze the IPU's architecture, performance, and communication, providing insights and predictive models for AI/ML workloads.
Findings
IPU's memory and interconnect performance characterized
Performance models for predicting application behavior developed
Architectural insights inform optimization strategies
Abstract
This report focuses on the architecture and performance of the Intelligence Processing Unit (IPU), a novel, massively parallel platform recently introduced by Graphcore and aimed at Artificial Intelligence/Machine Learning (AI/ML) workloads. We dissect the IPU's performance behavior using microbenchmarks that we crafted for the purpose. We study the IPU's memory organization and performance. We study the latency and bandwidth that the on-chip and off-chip interconnects offer, both in point-to-point transfers and in a spectrum of collective operations, under diverse loads. We evaluate the IPU's compute power over matrix multiplication, convolution, and AI/ML primitives. We discuss actual performance in comparison with its theoretical limits. Our findings reveal how the IPU's architectural design affects its performance. Moreover, they offer simple mental models to predict an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Parallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices
