A High-Throughput Solver for Marginalized Graph Kernels on GPU
Yu-Hang Tang, Oguz Selvitopi, Doru Popovici, Ayd{\i}n Bulu\c{c}

TL;DR
This paper introduces a GPU-based high-throughput solver for marginalized graph kernels, significantly accelerating kernel evaluations for large graph datasets using innovative on-the-fly tensor product computation and optimized memory reuse.
Contribution
The paper presents a novel GPU solver that efficiently computes marginalized graph kernels by on-the-fly tensor product formation and hierarchical sparsity exploitation, achieving 1000x speedup over CPU methods.
Findings
Achieves 3-4 orders of magnitude speedup over CPU solvers.
Enables kernel-based learning on large-scale graph datasets.
Demonstrates effectiveness on synthetic and real-world graphs.
Abstract
We present the design and optimization of a linear solver on General Purpose GPUs for the efficient and high-throughput evaluation of the marginalized graph kernel between pairs of labeled graphs. The solver implements a preconditioned conjugate gradient (PCG) method to compute the solution to a generalized Laplacian equation associated with the tensor product of two graphs. To cope with the gap between the instruction throughput and the memory bandwidth of current generation GPUs, our solver forms the tensor product linear system on-the-fly without storing it in memory when performing matrix-vector dot product operations in PCG. Such on-the-fly computation is accomplished by using threads in a warp to cooperatively stream the adjacency and edge label matrices of individual graphs by small square matrix blocks called tiles, which are then staged in registers and the shared memory for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
