Accelerating X-Ray Tracing for Exascale Systems using Kokkos
Felix Wittwer (1), Nicholas K. Sauter (2), Derek Mendez (2), Billy K., Poon (2), Aaron S. Brewster (2), James M. Holton (2), Michael E. Wall (3),, William E. Hart (4), Deborah J. Bard (1), Johannes P. Blaschke (1) ((1), National Energy Research Scientific Computing Center

TL;DR
This paper demonstrates how Kokkos enables performance-portable X-ray tracing across different GPU architectures, achieving significant speed-ups and avoiding vendor lock-in on exascale systems.
Contribution
The paper shows successful porting of a real-world X-ray tracing application to Kokkos, achieving performance portability and notable speed-ups on NVIDIA and AMD GPUs.
Findings
Achieved 13-66% speed-up over original CUDA code.
Successfully ran the same code on NVIDIA and AMD systems.
Demonstrated performance portability using Kokkos.
Abstract
The upcoming exascale computing systems Frontier and Aurora will draw much of their computing power from GPU accelerators. The hardware for these systems will be provided by AMD and Intel, respectively, each supporting their own GPU programming model. The challenge for applications that harness one of these exascale systems will be to avoid lock-in and to preserve performance portability. We report here on our results of using Kokkos to accelerate a real-world application on NERSC's Perlmutter Phase 1 (using NVIDIA A100 accelerators) and the testbed system for OLCF's Frontier (using AMD MI250X). By porting to Kokkos, we were able to successfully run the same X-ray tracing code on both systems and achieved speed-ups between 13% and 66% compared to the original CUDA code. These results are a highly encouraging demonstration of using Kokkos to accelerate production science code.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques · Scientific Computing and Data Management
