Giga-scale Kernel Matrix Vector Multiplication on GPU

Robert Hu; Siu Lun Chau; Dino Sejdinovic; Joan Alexis Glaun\`es

arXiv:2202.01085·math.NA·February 25, 2025

Giga-scale Kernel Matrix Vector Multiplication on GPU

Robert Hu, Siu Lun Chau, Dino Sejdinovic, Joan Alexis Glaun\`es

PDF

Open Access 1 Repo

TL;DR

This paper introduces threem, a novel approximation method for kernel matrix-vector multiplication that achieves near-linear time and memory complexity, enabling large-scale computations on GPUs with significant speedups.

Contribution

The paper presents threem, a new approximation technique for KMVM that scales linearly in time and memory, suitable for very large datasets on GPU hardware.

Findings

01

threem achieves linear time and memory complexity with low error.

02

It can compute billion-point KMVMs in under a minute on high-end GPUs.

03

It improves existing GPU-based solvers' speed by 1.5-5.5 times with minimal accuracy loss.

Abstract

Kernel matrix-vector multiplication (KMVM) is a foundational operation in machine learning and scientific computing. However, as KMVM tends to scale quadratically in both memory and time, applications are often limited by these computational constraints. In this paper, we propose a novel approximation procedure coined \textit{Faster-Fast and Free Memory Method} ( $\fthreem$ ) to address these scaling issues of KMVM for tall~( $1 0^{8} \sim 1 0^{9}$ ) and skinny~( $D \leq 7$ ) data. Extensive experiments demonstrate that $\fthreem$ has empirical \emph{linear time and memory} complexity with a relative error of order $1 0^{- 3}$ and can compute a full KMVM for a billion points \emph{in under a minute} on a high-end GPU, leading to a significant speed-up in comparison to existing CPU methods. We demonstrate the utility of our procedure by applying it as a drop-in for the state-of-the-art GPU-based linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MrHuff/F3M
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Gaussian Processes and Bayesian Inference

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings