A Practical GPU-Accelerated Implementation of Orthogonal Matching Pursuit
Ariel Lubonja, Sebastian Kazmarek Praesius, Trac Duy Tran

TL;DR
This paper presents a GPU-accelerated implementation of Orthogonal Matching Pursuit that significantly outperforms existing solutions in speed, leveraging matrix properties and modern hardware for efficient sparse solution computation.
Contribution
The authors developed a highly efficient GPU-based OMP implementation that exploits Cholesky inverse properties and modern linear algebra kernels, achieving substantial speedups.
Findings
Up to 310x speedup over Scikit-Learn
Up to 26x speedup over SPAMS
Fully compatible with scikit-learn and available on PyPI
Abstract
Finding the sparsest solution to the underdetermined system , given a tolerance, is known to be NP-hard. Many approximate solutions to this problem exist, and Orthogonal Matching Pursuit (OMP) is one of the most widely used. However, existing OMP implementations don't take full advantage of matrix properties or modern CPU and GPU-based Linear Algebra kernels. For this paper, we implemented an efficient implementation of OMP that leverages Cholesky inverse properties as well as the power of GPUs to deliver up to \textbf{310x speedup over Scikit-Learn} and \textbf{26x over SPAMS}. The package is published on PyPI (\texttt{pip install batched-omp}) and is fully scikit-learn compatible.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
