EvoSort: A Genetic-Algorithm-Based Adaptive Parallel Sorting Framework for Large-Scale High Performance Computing
Shashank Raj, Kalyanmoy Deb

TL;DR
EvoSort is an adaptive parallel sorting framework that uses a genetic algorithm to optimize sorting parameters, significantly improving performance for large-scale data processing in Python environments.
Contribution
It introduces a novel GA-based auto-tuning approach for parallel sorting, enhancing performance and adaptability across diverse data types and hardware.
Findings
Achieves up to 225x speedup over existing methods
Successfully handles up to 10 billion elements across various distributions
Demonstrates consistent performance improvements on multiple hardware platforms
Abstract
We present EvoSort, a general-purpose adaptive parallel parallel sorting framework accessible at the Python level. EvoSort employs a Genetic Algorithm (GA) to automatically discover and refine critical parameters, including insertion sort thresholds and algorithm selection (e.g., versus LSD radix sort). By adapting continuously to input data and system architecture, EvoSort provides a drop-in replacement for standard Python routines like NumPy and Pandas. Experiments up to10 billion elements across nine data distributions and two hardware platforms demonstrate that EvoSort consistently outperforms competing methods. Results show speedups of up to 225x, exemplifying a powerful auto-tuning solution for large-scale data processing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
