Run Generation Revisited: What Goes Up May or May Not Come Down
Michael A. Bender (1), Samuel McCauley (1), Andrew McGregor (2),, Shikha Singh (1), Hoa T. Vu (2) ((1) Stony Brook University, (2), University of Massachusetts, Amherst)

TL;DR
This paper revisits run generation in external-memory sorting, analyzing online and offline algorithms, and introduces strategies with resource augmentation and foresight to optimize run length and total runs.
Contribution
It provides a comprehensive analysis of run generation algorithms, proving optimality of simple policies, and introduces new algorithms with resource augmentation and foresight for improved performance.
Findings
Alternating-up-down replacement selection is asymptotically optimal.
Resource augmentation allows online algorithms to outperform traditional limits.
Foresight and input structure can significantly improve run generation efficiency.
Abstract
In this paper, we revisit the classic problem of run generation. Run generation is the first phase of external-memory sorting, where the objective is to scan through the data, reorder elements using a small buffer of size M , and output runs (contiguously sorted chunks of elements) that are as long as possible. We develop algorithms for minimizing the total number of runs (or equivalently, maximizing the average run length) when the runs are allowed to be sorted or reverse sorted. We study the problem in the online setting, both with and without resource augmentation, and in the offline setting. (1) We analyze alternating-up-down replacement selection (runs alternate between sorted and reverse sorted), which was studied by Knuth as far back as 1963. We show that this simple policy is asymptotically optimal. Specifically, we show that alternating-up-down replacement selection is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Algorithms and Data Compression · Distributed systems and fault tolerance
