Scaling Package Queries to a Billion Tuples via Hierarchical   Partitioning and Customized Optimization

Anh L.Mai; Pengyu Wang; Azza Abouzied; Matteo Brucato; Peter J.Haas,; and Alexandra Meliou

arXiv:2307.02860·cs.DB·November 16, 2023·2 cites

Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization

Anh L.Mai, Pengyu Wang, Azza Abouzied, Matteo Brucato, Peter J.Haas,, and Alexandra Meliou

PDF

Open Access 1 Repo

TL;DR

This paper introduces Progressive Shading, a scalable algorithm for package queries that efficiently handles billions of tuples using hierarchical partitioning and customized optimization, significantly improving over prior methods.

Contribution

It presents a novel hierarchical partitioning approach and optimized ILP/LP solvers that enable package query processing at unprecedented data scales.

Findings

01

Scales to billions of tuples efficiently.

02

Handles very tight constraints gracefully.

03

Outperforms traditional partitioning schemes.

Abstract

A package query returns a package - a multiset of tuples - that maximizes or minimizes a linear objective function subject to linear constraints, thereby enabling in-database decision support. Prior work has established the equivalence of package queries to Integer Linear Programs (ILPs) and developed the SketchRefine algorithm for package query processing. While this algorithm was an important first step toward supporting prescriptive analytics scalably inside a relational database, it struggles when the data size grows beyond a few hundred million tuples or when the constraints become very tight. In this paper, we present Progressive Shading, a novel algorithm for processing package queries that can scale efficiently to billions of tuples and gracefully handle tight constraints. Progressive Shading solves a sequence of optimization problems over a hierarchy of relations, each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alm818/packagequery
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Constraint Satisfaction and Optimization