FlashSketch: Sketch-Kernel Co-Design for Fast Sparse Sketching on GPUs

Rajat Vadiraj Dwaraknath; Sungyoon Kim; Mert Pilanci

arXiv:2602.06071·cs.DC·February 9, 2026

FlashSketch: Sketch-Kernel Co-Design for Fast Sparse Sketching on GPUs

Rajat Vadiraj Dwaraknath, Sungyoon Kim, Mert Pilanci

PDF

Open Access

TL;DR

FlashSketch introduces a co-designed sparse sketching method and optimized GPU kernel that significantly improves the speed of randomized linear algebra tasks while maintaining sketch quality, enabling faster computations on modern GPUs.

Contribution

The paper presents BlockPerm-SJLT, a new sparse sketch family, and FlashSketch, an optimized CUDA kernel, achieving efficient GPU implementation with tunable robustness-speed trade-offs.

Findings

01

Achieves ~1.7x speedup over prior GPU sketches.

02

Provides theoretical guarantees under the OSE framework.

03

Balances sketching robustness and GPU efficiency effectively.

Abstract

Sparse sketches such as the sparse Johnson-Lindenstrauss transform are a core primitive in randomized numerical linear algebra because they leverage random sparsity to reduce the arithmetic cost of sketching, while still offering strong approximation guarantees. Their random sparsity, however, is at odds with efficient implementations on modern GPUs, since it leads to irregular memory access patterns that degrade memory bandwidth utilization. Motivated by this tension, we pursue a sketch-kernel co-design approach: we design a new family of sparse sketches, BlockPerm-SJLT, whose sparsity structure is chosen to enable FlashSketch, a corresponding optimized CUDA kernel that implements these sketches efficiently. The design of BlockPerm-SJLT introduces a tunable parameter that explicitly trades off the tension between GPU-efficiency and sketching robustness. We provide theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques