A Practical GPU-Accelerated Implementation of Orthogonal Matching Pursuit

Ariel Lubonja; Sebastian Kazmarek Praesius; Trac Duy Tran

arXiv:2407.06434·cs.DC·April 1, 2026·2 cites

A Practical GPU-Accelerated Implementation of Orthogonal Matching Pursuit

Ariel Lubonja, Sebastian Kazmarek Praesius, Trac Duy Tran

PDF

TL;DR

This paper presents a GPU-accelerated implementation of Orthogonal Matching Pursuit that significantly outperforms existing solutions in speed, leveraging matrix properties and modern hardware for efficient sparse solution computation.

Contribution

The authors developed a highly efficient GPU-based OMP implementation that exploits Cholesky inverse properties and modern linear algebra kernels, achieving substantial speedups.

Findings

01

Up to 310x speedup over Scikit-Learn

02

Up to 26x speedup over SPAMS

03

Fully compatible with scikit-learn and available on PyPI

Abstract

Finding the sparsest solution to the underdetermined system $y = Ax$ , given a tolerance, is known to be NP-hard. Many approximate solutions to this problem exist, and Orthogonal Matching Pursuit (OMP) is one of the most widely used. However, existing OMP implementations don't take full advantage of matrix properties or modern CPU and GPU-based Linear Algebra kernels. For this paper, we implemented an efficient implementation of OMP that leverages Cholesky inverse properties as well as the power of GPUs to deliver up to \textbf{310x speedup over Scikit-Learn} and \textbf{26x over SPAMS}. The package is published on PyPI (\texttt{pip install batched-omp}) and is fully scikit-learn compatible.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.