Giga-scale Kernel Matrix Vector Multiplication on GPU
Robert Hu, Siu Lun Chau, Dino Sejdinovic, Joan Alexis Glaun\`es

TL;DR
This paper introduces threem, a novel approximation method for kernel matrix-vector multiplication that achieves near-linear time and memory complexity, enabling large-scale computations on GPUs with significant speedups.
Contribution
The paper presents threem, a new approximation technique for KMVM that scales linearly in time and memory, suitable for very large datasets on GPU hardware.
Findings
threem achieves linear time and memory complexity with low error.
It can compute billion-point KMVMs in under a minute on high-end GPUs.
It improves existing GPU-based solvers' speed by 1.5-5.5 times with minimal accuracy loss.
Abstract
Kernel matrix-vector multiplication (KMVM) is a foundational operation in machine learning and scientific computing. However, as KMVM tends to scale quadratically in both memory and time, applications are often limited by these computational constraints. In this paper, we propose a novel approximation procedure coined \textit{Faster-Fast and Free Memory Method} () to address these scaling issues of KMVM for tall~() and skinny~() data. Extensive experiments demonstrate that has empirical \emph{linear time and memory} complexity with a relative error of order and can compute a full KMVM for a billion points \emph{in under a minute} on a high-end GPU, leading to a significant speed-up in comparison to existing CPU methods. We demonstrate the utility of our procedure by applying it as a drop-in for the state-of-the-art GPU-based linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Gaussian Processes and Bayesian Inference
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
