Introducing Distributed Dynamic Data-intensive (D3) Science: Understanding Applications and Infrastructure
Shantenu Jha, Daniel S. Katz, Andre Luckow, Omer Rana, Yogesh Simmhan,, Neil Chue Hong

TL;DR
This paper introduces the concept of Distributed Dynamic Data-intensive (D3) Science, highlighting the challenges and infrastructure needed for analyzing large, dynamic, and distributed datasets across scientific applications.
Contribution
It provides a comprehensive survey and a conceptual framework for understanding dynamic distributed data-intensive applications and their supporting infrastructure.
Findings
Identifies key characteristics of dynamic distributed data applications
Proposes a common framework for understanding application requirements
Examines existing infrastructure supporting D3 applications
Abstract
A common feature across many science and engineering applications is the amount and diversity of data and computation that must be integrated to yield insights. Data sets are growing larger and becoming distributed; and their location, availability and properties are often time-dependent. Collectively, these characteristics give rise to dynamic distributed data-intensive applications. While "static" data applications have received significant attention, the characteristics, requirements, and software systems for the analysis of large volumes of dynamic, distributed data, and data-intensive applications have received relatively less attention. This paper surveys several representative dynamic distributed data-intensive application scenarios, provides a common conceptual framework to understand them, and examines the infrastructure used in support of applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Advanced Data Storage Technologies
