Balanced Learned Sort: a new learned model for fast and balanced item   bucketing

Paolo Ferragina; Mattia Odorisio

arXiv:2407.00734·cs.DS·July 3, 2024

Balanced Learned Sort: a new learned model for fast and balanced item bucketing

Paolo Ferragina, Mattia Odorisio

PDF

Open Access

TL;DR

This paper explores learned models for distribution-based sorting, introducing novel models that improve speed and balance, and demonstrates their superior performance on large synthetic and real datasets.

Contribution

It proposes new learned models for sorting that are space-efficient, monotonic, and fast, and integrates them into a new sorting algorithm with superior experimental results.

Findings

01

Learned models outperform traditional methods on most datasets.

02

New models are space-efficient, monotonic, and fast.

03

Proposed sorters outperform existing ones on 31 out of 33 datasets.

Abstract

This paper aims to better understand the strengths and limitations of adopting learned-based approaches in sequential sorting numerical data, via two main research steps. First, we study different learned models for distribution-based sorting, starting from some known ones (i.e., two-layer RMI or simple linear models) and then introducing some novel models that either improve the two-layer RMI or are fully new in their algorithmic structure thus resulting space efficient, monotonic, and very fast in building balanced buckets. We test those models over 11 synthetic datasets drawn from different distributions of 200M 64-bit floating-point items, so deriving hints about their ultimate performance and usefulness in designing a sorting algorithm. Based on these findings, we select and plug the best models from above in a new learned-based algorithmic scheme and devise three new sorters…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducational Technology and Assessment