Porting numerical integration codes from CUDA to oneAPI: a case study
Ioannis Sakiotis, Kamesh Arumugam, Marc Paterno, Desh Ranjan, Balsa, Terzic, Mohammad Zubair

TL;DR
This paper details the process and challenges of porting CUDA-based numerical integration codes to oneAPI, demonstrating that performance can be maintained within 10% of the original on Nvidia V100 GPUs.
Contribution
It provides a practical case study of porting optimized CUDA codes to oneAPI, highlighting challenges and solutions for maintaining performance.
Findings
oneAPI ports are within 10% performance of CUDA implementations
Addressed challenges include register usage, compiler optimizations, and library call mappings
Porting enables cross-platform compatibility with minimal performance loss
Abstract
We present our experience in porting optimized CUDA implementations to oneAPI. We focus on the use case of numerical integration, particularly the CUDA implementations of PAGANI and -Cubes. We faced several challenges that caused performance degradation in the oneAPI ports. These include differences in utilized registers per thread, compiler optimizations, and mappings of CUDA library calls to oneAPI equivalents. After addressing those challenges, we tested both the PAGANI and m-Cubes integrators on numerous integrands of various characteristics. To evaluate the quality of the ports, we collected performance metrics of the CUDA and oneAPI implementations on the Nvidia V100 GPU. We found that the oneAPI ports often achieve comparable performance to the CUDA versions, and that they are at most 10% slower.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Numerical Methods and Algorithms · Advanced Data Storage Technologies
