The End of an Architectural Era for Analytical Databases
Reynold S. Xin

TL;DR
The paper argues that traditional monolithic data warehouses are inflexible and outdated, proposing a new modular, high-performance system that integrates SQL and machine learning, inspired by the Shark system at Berkeley.
Contribution
It introduces a rethought design for data warehouses that combines the strengths of relational databases and Hadoop ecosystems, exemplified by the Shark system.
Findings
Traditional warehouses are inflexible and slow to adapt.
Hadoop systems underutilize memory and have naive execution engines.
A new modular, high-performance warehouse design is proposed.
Abstract
Traditional enterprise warehouse solutions center around an analytical database system that is monolithic and inflexible: data needs to be extracted, transformed, and loaded into the rigid relational form before analysis. It takes years of sophisticated planning to provision and deploy a warehouse; adding new hardware resources to an existing warehouse is an equally lengthy and daunting task. Additionally, modern data analysis employs statistical methods that go well beyond the typical roll-up and drill-down capabilities provided by warehouse systems. Although it is possible to implement such methods using a combination of SQL and UDFs, query engines in relational databases are ill-suited for these. The Hadoop ecosystem introduces a suite of tools for data analytics that overcome some of the problems of traditional solutions. These systems, however, forgo years of warehouse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Big Data and Business Intelligence
