Approximate sorting and its application in I/O model

Tianpeng Gao; Jianzhong Li

arXiv:2208.10298·cs.DS·March 29, 2023·1 cites

Approximate sorting and its application in I/O model

Tianpeng Gao, Jianzhong Li

PDF

Open Access

TL;DR

This paper introduces a new external metric for approximate sorting in big data, proposes an optimal algorithm called EASORT, and explores its applications in indexing and query processing.

Contribution

It develops a novel external metric and errors metric for approximate sorting, proves lower bounds, and presents an asymptotically optimal sorting algorithm EASORT for I/O models.

Findings

01

EASORT is asymptotically optimal for external approximate sorting.

02

New external metrics effectively evaluate approximate sorted results.

03

Applications include indexing and join operations on approximate data.

Abstract

The approximate sorting for big data is considered in this paper. The goal of approximate sorting for big data is to generate an approximate sorted result, but using less CPU and I/O cost. For big data, we consider the approximate sorting in I/O model. The existing metrics on permutation space are not available for external approximate sorting algorithms. Thus, we propose a new kind of metric named External metric, which ignores the errors and dislocation that happened in each I/O block.The External Spearmans footrule metric is an example of external metric for Spearmans footrule metric. Furthermore, to facilitate a better evaluation of the approximate sorted result, we propose a new metric, named as errors, which directly states the number of dislocation of the elements. Its external metric external errors is also considered in this paper. Then, according to the rate-distortion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Data Mining Algorithms and Applications · Algorithms and Data Compression