Data Science: a Natural Ecosystem
Emilio Porcu (KUSTAR), Roy El Moukari (KUSTAR), Laurent Najman (KUSTAR, LIGM), Francisco Herrera (UGR), Horst Simon (ADIA)

TL;DR
This paper presents a systemic, data-centric view of data science as a natural ecosystem, emphasizing the integration of data complexities, agent roles, and interdisciplinary missions within a formal architecture.
Contribution
It introduces a formal ecosystem model of data science, defining discipline-specific and pan-data science, and proposes a fusion-oriented architecture for integrating heterogeneous knowledge and workflows.
Findings
Formal ecosystem model of data science
Definition of discipline-induced and pan-data science
A general-purpose architecture for heterogeneous knowledge integration
Abstract
This manuscript provides a systemic and data-centric view of what we term essential data science, as a natural ecosystem with challenges and missions stemming from the fusion of data universe with its multiple combinations of the 5D complexities (data structure, domain, cardinality, causality, and ethics) with the phases of the data life cycle. Data agents perform tasks driven by specific goals. The data scientist is an abstract entity that comes from the logical organization of data agents with their actions. Data scientists face challenges that are defined according to the missions. We define specific discipline-induced data science, which in turn allows for the definition of pan-data science, a natural ecosystem that integrates specific disciplines with the essential data science. We semantically split the essential data science into computational, and foundational. By formalizing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence
