Overview and Prospects of Using Integer Surrogate Keys for Data Warehouse Performance Optimization
Sviatoslav Stumpf, Vladislav Povyshev

TL;DR
This paper explores the use of integer surrogate keys for datetime data in data warehouses, demonstrating significant improvements in storage efficiency and query performance through practical algorithms validated on real-world workloads.
Contribution
It introduces practical formats and algorithms for using integer surrogate keys for time, showing substantial storage and performance benefits over standard date and timestamp types.
Findings
Storage reduced by 30-60% using integer formats.
Query execution speeds increased by 25-40%.
Algorithms achieved up to eightfold throughput increase.
Abstract
The aim of this paper is to examine and demonstrate how integer-based datetime labels (integer surrogate keys for time) can optimize data-warehouse and time-series performance, proposing practical formats and algorithms and validating their efficiency on real-world workloads. It is shown that replacing standard DATE and TIMESTAMP types with 32- and 64-bit integer formats reduces storage requirements by 30-60 percent and speeds up query execution by 25-40 percent. The paper presents indexing, aggregation, compression, and batching algorithms demonstrating up to an eightfold increase in throughput. Practical examples from finance, telecommunications, IoT, and scientific research confirm the efficiency and versatility of the proposed approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Time Series Analysis and Forecasting
