A Performance Comparison of CUDA and OpenCL
Kamran Karimi, Neil G. Dickson, and Firas Hamze

TL;DR
This paper compares CUDA and OpenCL GPU programming frameworks using a Quantum Monte Carlo application, analyzing performance differences in data transfer, kernel execution, and overall application runtime.
Contribution
It provides a detailed performance comparison between CUDA and OpenCL on NVIDIA and ATI hardware using similar kernels from a real application.
Findings
CUDA kernels require minimal modifications to convert to OpenCL on NVIDIA tools.
Performance differences depend on hardware and toolchain; OpenCL may have higher data transfer times.
End-to-end application performance varies between CUDA and OpenCL based on device and compiler.
Abstract
CUDA and OpenCL are two different frameworks for GPU programming. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a portable language for GPU programming, its generality may entail a performance penalty. In this paper, we use complex, near-identical kernels from a Quantum Monte Carlo application to compare the performance of CUDA and OpenCL. We show that when using NVIDIA compiler tools, converting a CUDA kernel to an OpenCL kernel involves minimal modifications. Making such a kernel compile with ATI's build tools involves more modifications. Our performance tests measure and compare data transfer times to and from the GPU, kernel execution times, and end-to-end application execution times for both CUDA and OpenCL.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Algorithms and Data Compression · Advanced Data Storage Technologies
