Porting numerical integration codes from CUDA to oneAPI: a case study

Ioannis Sakiotis; Kamesh Arumugam; Marc Paterno; Desh Ranjan; Balsa; Terzic; Mohammad Zubair

arXiv:2302.05730·cs.DC·February 20, 2023·1 cites

Porting numerical integration codes from CUDA to oneAPI: a case study

Ioannis Sakiotis, Kamesh Arumugam, Marc Paterno, Desh Ranjan, Balsa, Terzic, Mohammad Zubair

PDF

Open Access

TL;DR

This paper details the process and challenges of porting CUDA-based numerical integration codes to oneAPI, demonstrating that performance can be maintained within 10% of the original on Nvidia V100 GPUs.

Contribution

It provides a practical case study of porting optimized CUDA codes to oneAPI, highlighting challenges and solutions for maintaining performance.

Findings

01

oneAPI ports are within 10% performance of CUDA implementations

02

Addressed challenges include register usage, compiler optimizations, and library call mappings

03

Porting enables cross-platform compatibility with minimal performance loss

Abstract

We present our experience in porting optimized CUDA implementations to oneAPI. We focus on the use case of numerical integration, particularly the CUDA implementations of PAGANI and $m$ -Cubes. We faced several challenges that caused performance degradation in the oneAPI ports. These include differences in utilized registers per thread, compiler optimizations, and mappings of CUDA library calls to oneAPI equivalents. After addressing those challenges, we tested both the PAGANI and m-Cubes integrators on numerous integrands of various characteristics. To evaluate the quality of the ports, we collected performance metrics of the CUDA and oneAPI implementations on the Nvidia V100 GPU. We found that the oneAPI ports often achieve comparable performance to the CUDA versions, and that they are at most 10% slower.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Numerical Methods and Algorithms · Advanced Data Storage Technologies