Finding Your Way Through the Jungle of Big Data Architectures

Torsten Priebe; Sebastian Neumaier; Stefan Markus

arXiv:2201.04233·cs.DB·January 24, 2022

Finding Your Way Through the Jungle of Big Data Architectures

Torsten Priebe, Sebastian Neumaier, Stefan Markus

PDF

TL;DR

This paper systematically reviews various big data architectures, comparing their features and interdependencies, and proposes a framework to guide selecting appropriate architectures for specific data management needs.

Contribution

It offers a first comprehensive view of modern data architectures like Data Mesh and Data Fabric, and suggests a pattern-based approach for architecture selection.

Findings

01

Provides a comparative analysis of data architectures

02

Identifies interdependencies among architectures

03

Proposes a pattern system for architecture selection

Abstract

This paper presents a systematic review of common analytical data architectures based on DAMA-DMBOK and ArchiMate. The paper is work in progress and provides a first view on Gartner's Logical Data Warehouse paradigm, Data Fabric and Dehghani's Data Mesh proposal as well as their interdependencies. It furthermore sketches the way forward how this work can be extended by covering more architecture paradigms (incl. classic Data Warehouse, Data Vault, Data Lake, Lambda and Kappa architectures) and introducing a template with among others "context", "problem" and "solution" descriptions, leading ultimately to a pattern system providing guidance for choosing the right architecture paradigm for the right situation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.