Optimization Strategies for Parallel Computation of Skylines

Paolo Ciaccia; Davide Martinenghi

arXiv:2411.14968·cs.DB·November 25, 2024

Optimization Strategies for Parallel Computation of Skylines

Paolo Ciaccia, Davide Martinenghi

PDF

Open Access

TL;DR

This paper reviews existing partitioning methods for skyline computation and introduces two new optimization strategies to enhance parallel processing efficiency in multi-core environments using PySpark.

Contribution

It proposes two orthogonal optimization strategies for parallel skyline computation and compares their performance experimentally.

Findings

01

Optimized strategies reduce computational overhead.

02

Parallelization improves skyline query efficiency.

03

Experimental results demonstrate performance gains.

Abstract

Skyline queries are one of the most widely adopted tools for Multi-Criteria Analysis, with applications covering diverse domains, including, e.g., Database Systems, Data Mining, and Decision Making. Skylines indeed offer a useful overview of the most suitable alternatives in a dataset, while discarding all the options that are dominated by (i.e., worse than) others. The intrinsically quadratic complexity associated with skyline computation has pushed researchers to identify strategies for parallelizing the task, particularly by partitioning the dataset at hand. In this paper, after reviewing the main partitioning approaches available in the relevant literature, we propose two orthogonal optimization strategies for reducing the computational overhead, and compare them experimentally in a multi-core environment equipped with PySpark.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms