COX: CUDA on X86 by Exposing Warp-Level Functions to CPUs
Ruobing Han, Jaewon Lee, Jaewoong Sim, Hyesoon Kim

TL;DR
COX is a framework that enables efficient execution of modern CUDA programs on CPU platforms by exposing warp-level functions to CPUs, achieving high coverage and performance.
Contribution
The paper introduces hierarchical collapsing and a new LLVM pass to support CUDA warp-level functions on CPUs, significantly improving compatibility and efficiency.
Findings
Supports 90% of CUDA features compared to 68% in previous frameworks
Efficient execution of warp-level functions using CPU SIMD (AVX) instructions
High application coverage with comparable performance to GPU execution
Abstract
As CUDA programs become the de facto program among data parallel applications such as high-performance computing or machine learning applications, running CUDA on other platforms has been a compelling option. Although several efforts have attempted to support CUDA on other than NVIDIA GPU devices, due to extra steps in the translation, the support is always behind a few years from supporting CUDA's latest features. The examples are DPC, Hipfy, where CUDA source code have to be translated to their native supporting language and then they are supported. In particular, the new CUDA programming model exposes the warp concept in the programming language, which greatly changes the way the CUDA code should be mapped to CPU programs. In this paper, hierarchical collapsing that \emph{correctly} supports CUDA warp-level functions on CPUs is proposed. Based on hierarchical collapsing, a framework,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Advanced Data Storage Technologies
