Leyenda: An Adaptive, Hybrid Sorting Algorithm for Large Scale Data with Limited Memory
Yuanjing Shi, Zhaoxing Li

TL;DR
Leyenda is an adaptive, hybrid sorting algorithm designed for large-scale data with limited memory, optimizing disk I/O and CPU cache usage to outperform existing methods in various environments.
Contribution
It introduces Leyenda, a novel hybrid Radix MSB MergeSort that adapts to hardware conditions for efficient internal and external sorting.
Findings
Outperforms GNU's parallel quick/merge sort by up to three times
Ranks second in ACM 2019 SIGMOD external sort contest
Achieves top overall performance in large-scale sorting
Abstract
Sorting is the one of the fundamental tasks of modern data management systems. With Disk I/O being the most-accused performance bottleneck and more computation-intensive workloads, it has come to our attention that in heterogeneous environment, performance bottleneck may vary among different infrastructure. As a result, sort kernels need to be adaptive to changing hardware conditions. In this paper, we propose Leyenda, a hybrid, parallel and efficient Radix Most-Significant-Bit (MSB) MergeSort algorithm, with utilization of local thread-level CPU cache and efficient disk/memory I/O. Leyenda is capable of performing either internal or external sort efficiently, based on different I/O and processing conditions. We benchmarked Leyenda with three different workloads from Sort Benchmark, targeting three unique use cases, including internal, partially in-memory and external sort, and we found…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Cloud Computing and Resource Management
