Heterogeneous Highly Parallel Implementation of Matrix Exponentiation Using GPU
Chittampally Vasanth Raja, Srinivas Balasubramanian, Prakash S, Raghavendra

TL;DR
This paper presents a highly parallel GPU-based implementation of matrix exponentiation using OpenCL, achieving significant speedups over naive methods for scientific computing applications.
Contribution
It introduces a heterogeneous, highly parallel matrix exponentiation method optimized for GPGPUs, demonstrating substantial performance improvements.
Findings
1000X speedup over naive GPU kernel
44X speedup with optimized kernel
Effective for various matrix sizes and powers
Abstract
The vision of super computer at every desk can be realized by powerful and highly parallel CPUs or GPUs or APUs. Graphics processors once specialized for the graphics applications only, are now used for the highly computational intensive general purpose applications. Very expensive GFLOPs and TFLOP performance has become very cheap with the GPGPUs. Current work focuses mainly on the highly parallel implementation of Matrix Exponentiation. Matrix Exponentiation is widely used in many areas of scientific community ranging from highly critical flight, CAD simulations to financial, statistical applications. Proposed solution for Matrix Exponentiation uses OpenCL for exploiting the hyper parallelism offered by the many core GPGPUs. It employs many general GPU optimizations and architectural specific optimizations. This experimentation covers the optimizations targeted specific to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
