On the effects of logical database design on database size, query complexity, query performance, and energy consumption
Toni Taipalus

TL;DR
This study empirically evaluates how normalization levels affect database size, query complexity, performance, and energy consumption using IMDb data and PostgreSQL, revealing that partial normalization improves throughput and reduces energy use.
Contribution
It provides the first detailed empirical analysis of normalization effects on database efficiency, quantifying impacts on size, performance, and energy consumption.
Findings
Normalization from 1NF to 2NF reduces disk size by 10%.
Normalization from 1NF to 2NF increases throughput by 4 times.
Energy consumption per transaction decreases by 74% with 1NF to 2NF normalization.
Abstract
Database normalization theory is the basis for logical design of relational databases. Normalization reduces data redundancy and consequently eliminates potential data anomalies, while increasing the computational cost of read operations. Despite decades worth of applications of normalization theory, it still remains largely unclear to what extent normalization affects database size and efficiency. In this study, we study the effects of database normalization using the Internet Movie Database (IMDb) public dataset and PostgreSQL. The results indicate, rather intuitively, that (i) database size on disk is reduced through normalization from 1NF to 2NF by 10%, but not from 2NF to 4NF, (ii) the number of tables and table rows in total increase monotonically from 1NF to 2NF to 4NF, and that (iii) query complexity increases with further normalization. Surprisingly, however, the results also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Cloud Computing and Resource Management · Data Management and Algorithms
