Qrita: High-performance Top-k and Top-p Algorithm for GPUs using Pivot-based Truncation and Selection
Jongseok Park, Sunga Kim, Alvin Cheung, Ion Stoica

TL;DR
Qrita introduces a pivot-based, GPU-efficient algorithm for Top-k and Top-p truncation in large language models, significantly improving throughput and memory efficiency while maintaining output quality.
Contribution
The paper presents Qrita, a novel pivot-based algorithm for Top-k and Top-p that reduces computation and memory overhead on GPUs, extending RTop-k techniques with Gaussian truncation and quaternary pivot search.
Findings
Up to 2x throughput improvement over existing kernels
Halves memory usage compared to sorting-based methods
Maintains identical output to traditional algorithms
Abstract
Top-k and Top-p are the dominant truncation operators in the sampling of large language models. Despite their widespread use, implementing them efficiently over large vocabularies remains a significant challenge. Existing approaches often rely on sorting, which incur significant computation and memory overhead on GPUs, or stochastic approaches, which alter the algorithm output. In this work, we propose Qrita, an efficient Top-k and Top-p algorithm based on a pivot-based selection strategy. Based on RTop-k, which uses a pivot-based search for node selection in graph neural networks, Qrita extends the concept of pivot-based search to both Top-k and Top-p with two key techniques: 1. Gaussian-based sigma-truncation, which greatly reduces the search space of the target elements, and 2. Quaternary pivot search with duplication handling, which halves the pivot search iteration and guarantees…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Natural Language Processing Techniques · Ferroelectric and Negative Capacitance Devices
