COX: CUDA on X86 by Exposing Warp-Level Functions to CPUs

Ruobing Han; Jaewon Lee; Jaewoong Sim; Hyesoon Kim

arXiv:2112.10034·cs.DC·December 21, 2021

COX: CUDA on X86 by Exposing Warp-Level Functions to CPUs

Ruobing Han, Jaewon Lee, Jaewoong Sim, Hyesoon Kim

PDF

Open Access

TL;DR

COX is a framework that enables efficient execution of modern CUDA programs on CPU platforms by exposing warp-level functions to CPUs, achieving high coverage and performance.

Contribution

The paper introduces hierarchical collapsing and a new LLVM pass to support CUDA warp-level functions on CPUs, significantly improving compatibility and efficiency.

Findings

01

Supports 90% of CUDA features compared to 68% in previous frameworks

02

Efficient execution of warp-level functions using CPU SIMD (AVX) instructions

03

High application coverage with comparable performance to GPU execution

Abstract

As CUDA programs become the de facto program among data parallel applications such as high-performance computing or machine learning applications, running CUDA on other platforms has been a compelling option. Although several efforts have attempted to support CUDA on other than NVIDIA GPU devices, due to extra steps in the translation, the support is always behind a few years from supporting CUDA's latest features. The examples are DPC, Hipfy, where CUDA source code have to be translated to their native supporting language and then they are supported. In particular, the new CUDA programming model exposes the warp concept in the programming language, which greatly changes the way the CUDA code should be mapped to CPU programs. In this paper, hierarchical collapsing that \emph{correctly} supports CUDA warp-level functions on CPUs is proposed. Based on hierarchical collapsing, a framework,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Advanced Data Storage Technologies