Dissecting the NVidia Turing T4 GPU via Microbenchmarking
Zhe Jia, Marco Maggioni, Jeffrey Smith, Daniele Paolo Scarpazza

TL;DR
This paper provides a detailed microbenchmarking analysis of the Nvidia Turing T4 GPU, revealing architectural features, performance improvements over previous generations, and insights into its instruction set and memory hierarchy to aid software optimization.
Contribution
It offers the first comprehensive microarchitectural dissection of the Turing T4 GPU, highlighting new instructions, memory hierarchy details, and performance characteristics compared to prior Nvidia GPUs.
Findings
Turing introduces new instructions for matrix math.
T4 GPU has larger cache levels than Pascal P4.
Performance benchmarks show substantial improvements over P4.
Abstract
In 2019, the rapid rate at which GPU manufacturers refresh their designs, coupled with their reluctance to disclose microarchitectural details, is still a hurdle for those software designers who want to extract the highest possible performance. Last year, these very reasons motivated us to dissect the Volta GPU architecture using microbenchmarks. The introduction in August 2018 of Turing, NVidia's latest architecture, pressed us to update our study. In this report, we examine Turing and compare it quantitatively against previous NVidia GPU generations. Specifically, we study the T4 GPU: a low-power board aiming at inference applications. We describe its improvements against its inference-oriented predecessor: the P4 GPU based on the Pascal architecture. Both T4 and P4 GPUs achieve significantly higher frequency-per-Watt figures than their full-size counterparts. We study the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Parallel Computing and Optimization Techniques
