A Faster Generalized Two-Stage Approximate Top-K
Yashas Samaga, Varun Yerram, Spandana Raj Babbula, Prateek Jain, Praneeth Netrapalli

TL;DR
This paper generalizes a two-stage approximate Top-K selection algorithm to improve efficiency and speed on accelerators, providing theoretical bounds and demonstrating significant speedups on Cloud TPUv5e.
Contribution
It introduces a generalized first stage selecting multiple top elements per partition, with theoretical analysis and practical implementation showing improved speed and maintained recall.
Findings
Expected recall bounds are tighter than previous work.
Choosing larger K' reduces input size more effectively.
Achieves ~10x speedup on Cloud TPUv5e without losing recall.
Abstract
We consider the Top- selection problem, which aims to identify the largest elements in an array. Top- selection arises in many machine learning algorithms and often becomes a bottleneck on accelerators, which are optimized for dense matrix multiplications. To address this problem, Chern et al. (2022) proposed a fast two-stage approximate Top- algorithm that: (i) partitions the input array into equal-sized chunks and selects the top- element from each partition; and (ii) sorts the resulting smaller subset and returns the top elements. In this paper, we generalize the first stage so that each partition selects the top elements (for ). Our contributions include: (i) an expression for the expected recall of this generalized algorithm under random partitioning, and a demonstration that choosing with fewer partitions in the first stage more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
