Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don, Reichart, Murali Venkatrao, Frank Pellow, Hamid Pirahesh

TL;DR
The paper introduces the data cube operator, a powerful generalization of traditional aggregation methods in SQL, enabling multi-dimensional data analysis and integration into complex data analysis workflows.
Contribution
It formally defines the data cube operator as a relation, explains its integration with SQL, and discusses efficient computation techniques, advancing multi-dimensional data aggregation.
Findings
Cubes are relations that enable N-dimensional data analysis.
The paper details how to implement and optimize cube computations.
Many features are being incorporated into the SQL Standard.
Abstract
Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Applications need the N-dimensional generalization of these operators. This paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The novelty is that cubes are relations. Consequently, the cube operator can be imbedded in more complex non-procedural data analysis programs. The cube operator treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensional cube. Super-aggregates are computed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms
