Approximate Top-$k$ for Increased Parallelism

Oscar Key; Luka Ribar; Alberto Cattaneo; Luke Hudlass-Galley; Douglas; Orr

arXiv:2412.04358·cs.LG·December 6, 2024

Approximate Top-$k$ for Increased Parallelism

Oscar Key, Luka Ribar, Alberto Cattaneo, Luke Hudlass-Galley, Douglas, Orr

PDF

Open Access

TL;DR

This paper evaluates bucketed approximate top-$k$ algorithms that increase parallelism in top-$k$ computations, crucial for scalable machine learning tasks, by relaxing exactness requirements and analyzing their design choices.

Contribution

It provides a theoretical and empirical analysis of bucketed approximate top-$k$ algorithms, introduces a fast implementation for PyTorch, and demonstrates their effectiveness in language model sparsity tasks.

Findings

01

Bucketed approximate top-$k$ algorithms significantly increase parallelism.

02

Relaxing exactness allows for more scalable top-$k$ computations.

03

Empirical results show effectiveness in language model sparsity.

Abstract

We present an evaluation of bucketed approximate top- $k$ algorithms. Computing top- $k$ exactly suffers from limited parallelism, because the $k$ largest values must be aggregated along the vector, thus is not well suited to computation on highly-parallel machine learning accelerators. By relaxing the requirement that the top- $k$ is exact, bucketed algorithms can dramatically increase the parallelism available by independently computing many smaller top- $k$ operations. We explore the design choices of this class of algorithms using both theoretical analysis and empirical evaluation on downstream tasks. Our motivating examples are sparsity algorithms for language models, which often use top- $k$ to select the most important parameters or activations. We also release a fast bucketed top- $k$ implementation for PyTorch.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplexity and Algorithms in Graphs · Advanced Graph Theory Research · Coding theory and cryptography