Kokkos Kernels: Performance Portable Sparse/Dense Linear Algebra and Graph Kernels
Sivasankaran Rajamanickam, Seher Acer, Luc Berger-Vergiat, Vinh Dang,, Nathan Ellingwood, Evan Harvey, Brian Kelley, Christian R. Trott, Jeremiah, Wilke, Ichitaro Yamazaki

TL;DR
Kokkos Kernels is a performance portable library providing a suite of sparse/dense linear algebra and graph kernels designed for evolving hardware architectures, demonstrating consistent performance across diverse platforms.
Contribution
This paper introduces Kokkos Kernels, a library that offers portable, high-performance kernels for linear algebra and graph computations across different hardware architectures.
Findings
Demonstrated portable performance of four sparse kernels
Achieved high efficiency with three dense batched kernels
Validated performance of two graph kernels and a team-level algorithm
Abstract
As hardware architectures are evolving in the push towards exascale, developing Computational Science and Engineering (CSE) applications depend on performance portable approaches for sustainable software development. This paper describes one aspect of performance portability with respect to developing a portable library of kernels that serve the needs of several CSE applications and software frameworks. We describe Kokkos Kernels, a library of kernels for sparse linear algebra, dense linear algebra and graph kernels. We describe the design principles of such a library and demonstrate portable performance of the library using some selected kernels. Specifically, we demonstrate the performance of four sparse kernels, three dense batched kernels, two graph kernels and one team level algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Low-power high-performance VLSI design · Error Correcting Code Techniques
