ZnG: Architecting GPU Multi-Processors with New Flash for Scalable Data   Analysis

Jie Zhang; Myoungsoo Jung

arXiv:2006.08975·cs.AR·June 17, 2020

ZnG: Architecting GPU Multi-Processors with New Flash for Scalable Data Analysis

Jie Zhang, Myoungsoo Jung

PDF

Open Access

TL;DR

ZnG introduces a GPU-SSD integrated architecture that replaces GPU DRAM with ultra-low-latency SSDs, enhancing memory capacity and performance through hardware acceleration and optimized buffering, achieving significant speedups.

Contribution

The paper presents a novel GPU-SSD architecture with integrated firmware and high-throughput flash network to maximize GPU memory capacity and performance, surpassing prior designs.

Findings

01

Achieves 7.5x higher performance than previous approaches.

02

Replaces GPU DRAM with SSDs for increased memory capacity.

03

Uses large cache and flash registers to buffer requests effectively.

Abstract

We propose ZnG, a new GPU-SSD integrated architecture, which can maximize the memory capacity in a GPU and address performance penalties imposed by an SSD. Specifically, ZnG replaces all GPU internal DRAMs with an ultra-low-latency SSD to maximize the GPU memory capacity. ZnG further removes performance bottleneck of the SSD by replacing its flash channels with a high-throughput flash network and integrating SSD firmware in the GPU's MMU to reap the benefits of hardware accelerations. Although flash arrays within the SSD can deliver high accumulated bandwidth, only a small fraction of such bandwidth can be utilized by GPU's memory requests due to mismatches of their access granularity. To address this, ZnG employs a large L2 cache and flash registers to buffer the memory requests. Our evaluation results indicate that ZnG can achieve 7.5x higher performance than prior work.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Interconnection Networks and Systems