Attribute Value Reordering For Efficient Hybrid OLAP
Owen Kaser, Daniel Lemire

TL;DR
This paper investigates attribute value reordering in data cubes to optimize storage efficiency in hybrid OLAP systems, presenting complexity results, optimal cases, heuristics, and demonstrating significant efficiency gains.
Contribution
It introduces the NP-hardness of optimal normalization, provides an exact algorithm for specific cases, and proposes heuristics for practical scenarios, improving HOLAP storage efficiency.
Findings
Normalization improves storage efficiency by up to 44%.
Dimension-wise attribute sorting is optimal under independence.
Heuristics outperform baseline methods in experiments.
Abstract
The normalization of a data cube is the ordering of the attribute values. For large multidimensional arrays where dense and sparse chunks are stored differently, proper normalization can lead to improved storage efficiency. We show that it is NP-hard to compute an optimal normalization even for 1x3 chunks, although we find an exact algorithm for 1x2 chunks. When dimensions are nearly statistically independent, we show that dimension-wise attribute frequency sorting is an optimal normalization and takes time O(d n log(n)) for data cubes of size n^d. When dimensions are not independent, we propose and evaluate several heuristics. The hybrid OLAP (HOLAP) storage mechanism is already 19%-30% more efficient than ROLAP, but normalization can improve it further by 9%-13% for a total gain of 29%-44% over ROLAP.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
