Approximate sorting and its application in I/O model
Tianpeng Gao, Jianzhong Li

TL;DR
This paper introduces a new external metric for approximate sorting in big data, proposes an optimal algorithm called EASORT, and explores its applications in indexing and query processing.
Contribution
It develops a novel external metric and errors metric for approximate sorting, proves lower bounds, and presents an asymptotically optimal sorting algorithm EASORT for I/O models.
Findings
EASORT is asymptotically optimal for external approximate sorting.
New external metrics effectively evaluate approximate sorted results.
Applications include indexing and join operations on approximate data.
Abstract
The approximate sorting for big data is considered in this paper. The goal of approximate sorting for big data is to generate an approximate sorted result, but using less CPU and I/O cost. For big data, we consider the approximate sorting in I/O model. The existing metrics on permutation space are not available for external approximate sorting algorithms. Thus, we propose a new kind of metric named External metric, which ignores the errors and dislocation that happened in each I/O block.The External Spearmans footrule metric is an example of external metric for Spearmans footrule metric. Furthermore, to facilitate a better evaluation of the approximate sorted result, we propose a new metric, named as errors, which directly states the number of dislocation of the elements. Its external metric external errors is also considered in this paper. Then, according to the rate-distortion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Mining Algorithms and Applications · Algorithms and Data Compression
