Sparton: Fast and Memory-Efficient Triton Kernel for Learned Sparse Retrieval

Thong Nguyen; Cosimo Rulli; Franco Maria Nardini; Rossano Venturini; Andrew Yates

arXiv:2603.25011·cs.IR·March 27, 2026

Sparton: Fast and Memory-Efficient Triton Kernel for Learned Sparse Retrieval

Thong Nguyen, Cosimo Rulli, Franco Maria Nardini, Rossano Venturini, Andrew Yates

PDF

Open Access

TL;DR

Sparton is a specialized GPU kernel that significantly accelerates and reduces memory usage in learned sparse retrieval models by integrating multiple operations into a single fused process, enabling larger batch sizes and faster training.

Contribution

We introduce Sparton, a fused GPU kernel for the LM head in LSR models that avoids materializing large matrices, improving speed and memory efficiency.

Findings

01

Up to 4.8x speedup over PyTorch baseline

02

Order-of-magnitude memory reduction

03

Enables larger batch sizes and faster training in LSR models

Abstract

State-of-the-art Learned Sparse Retrieval (LSR) models, such as Splade, typically employ a Language Modeling (LM) head to project latent hidden states into a lexically-anchored logit matrix. This intermediate matrix is subsequently transformed into a sparse lexical representation through element-wise operations (ReLU, Log1P) and max-pooling over the sequence dimension. Despite its effectiveness, the LM head creates a massive memory bottleneck due to the sheer size of the vocabulary (V), which can range from 30,000 to over 250,000 tokens in recent models. Materializing this matrix creates a significant memory bottleneck, limiting model scaling. The resulting I/O overhead between operators further throttles throughput and runtime performance. In this paper, we propose Sparton, a fast memory-efficient Triton kernel tailored for the LM head in LSR models. Sparton utilizes a fused approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Machine Learning in Materials Science