Overview and Prospects of Using Integer Surrogate Keys for Data Warehouse Performance Optimization

Sviatoslav Stumpf; Vladislav Povyshev

arXiv:2511.14502·cs.DB·November 19, 2025

Overview and Prospects of Using Integer Surrogate Keys for Data Warehouse Performance Optimization

Sviatoslav Stumpf, Vladislav Povyshev

PDF

Open Access

TL;DR

This paper explores the use of integer surrogate keys for datetime data in data warehouses, demonstrating significant improvements in storage efficiency and query performance through practical algorithms validated on real-world workloads.

Contribution

It introduces practical formats and algorithms for using integer surrogate keys for time, showing substantial storage and performance benefits over standard date and timestamp types.

Findings

01

Storage reduced by 30-60% using integer formats.

02

Query execution speeds increased by 25-40%.

03

Algorithms achieved up to eightfold throughput increase.

Abstract

The aim of this paper is to examine and demonstrate how integer-based datetime labels (integer surrogate keys for time) can optimize data-warehouse and time-series performance, proposing practical formats and algorithms and validating their efficiency on real-world workloads. It is shown that replacing standard DATE and TIMESTAMP types with 32- and 64-bit integer formats reduces storage requirements by 30-60 percent and speeds up query execution by 25-40 percent. The paper presents indexing, aggregation, compression, and batching algorithms demonstrating up to an eightfold increase in throughput. Practical examples from finance, telecommunications, IoT, and scientific research confirm the efficiency and versatility of the proposed approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Time Series Analysis and Forecasting