Diamond Dicing

Hazel Webb; Daniel Lemire; Owen Kaser

arXiv:1006.3726·cs.DB·July 25, 2014

Diamond Dicing

Hazel Webb, Daniel Lemire, Owen Kaser

PDF

TL;DR

The paper introduces the diamond cube operator for OLAP, enabling simultaneous threshold-based data selection across multiple dimensions, and demonstrates its efficient implementation surpassing traditional database systems.

Contribution

It presents the novel diamond cube operator for multidimensional data filtering and provides efficient algorithms that outperform standard database implementations.

Findings

01

Diamond cubes enable complex multi-threshold data selection.

02

Custom algorithms are up to 100 times faster than SQL implementations.

03

Efficient processing of large datasets over 100 million facts.

Abstract

In OLAP, analysts often select an interesting sample of the data. For example, an analyst might focus on products bringing revenues of at least 100 000 dollars, or on shops having sales greater than 400 000 dollars. However, current systems do not allow the application of both of these thresholds simultaneously, selecting products and shops satisfying both thresholds. For such purposes, we introduce the diamond cube operator, filling a gap among existing data warehouse operations. Because of the interaction between dimensions the computation of diamond cubes is challenging. We compare and test various algorithms on large data sets of more than 100 million facts. We find that while it is possible to implement diamonds in SQL, it is inefficient. Indeed, our custom implementation can be a hundred times faster than popular database engines (including a row-store and a column-store).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.