Pruning Attribute Values From Data Cubes with Diamond Dicing
Hazel Webb, Owen Kaser, Daniel Lemire

TL;DR
This paper introduces the diamond dice operator for multidimensional data pruning in data warehouses, enabling efficient computation of complex, multi-attribute queries on large datasets.
Contribution
It proposes a novel operator, diamond dice, and demonstrates its effectiveness on large data sets with practical computation times.
Findings
Diamond dice can be computed over 100 million facts in less than 35 minutes.
Experiments validate the operator's efficiency on large-scale data.
Diamond dicing addresses the challenge of multidimensional data pruning.
Abstract
Data stored in a data warehouse are inherently multidimensional, but most data-pruning techniques (such as iceberg and top-k queries) are unidimensional. However, analysts need to issue multidimensional queries. For example, an analyst may need to select not just the most profitable stores or--separately--the most profitable products, but simultaneous sets of stores and products fulfilling some profitability constraints. To fill this need, we propose a new operator, the diamond dice. Because of the interaction between dimensions, the computation of diamonds is challenging. We present the first diamond-dicing experiments on large data sets. Experiments show that we can compute diamond cubes over fact tables containing 100 million facts in less than 35 minutes using a standard PC.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Data Mining Algorithms and Applications
