Kernel Operations on the GPU, with Autodiff, without Memory Overflows
Benjamin Charlier, Jean Feydy, Joan Alexis Glaun\`es,, Fran\c{c}ois-David Collin, Ghislain Durif

TL;DR
KeOps is a GPU library that enables fast, memory-efficient computation of kernel and distance matrices with automatic differentiation, scaling to large datasets without memory overflows, outperforming standard libraries.
Contribution
KeOps introduces a novel GPU-based approach combining optimized C++/CUDA with high-level language support, significantly reducing memory usage for kernel computations.
Findings
Outperforms PyTorch CUDA tensors and other libraries in speed.
Supports large datasets with millions of samples.
Provides automatic differentiation and high-level language bindings.
Abstract
The KeOps library provides a fast and memory-efficient GPU support for tensors whose entries are given by a mathematical formula, such as kernel and distance matrices. KeOps alleviates the major bottleneck of tensor-centric libraries for kernel and geometric applications: memory consumption. It also supports automatic differentiation and outperforms standard GPU baselines, including PyTorch CUDA tensors or the Halide and TVM libraries. KeOps combines optimized C++/CUDA schemes with binders for high-level languages: Python (Numpy and PyTorch), Matlab and GNU R. As a result, high-level "quadratic" codes can now scale up to large data sets with millions of samples processed in seconds. KeOps brings graphics-like performances for kernel methods and is freely available on standard repositories (PyPi, CRAN). To showcase its versatility, we provide tutorials in a wide range of settings online…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Computational Physics and Python Applications · Tensor decomposition and applications
