Navigating the challenges in creating complex data systems: a development philosophy
S\"oren Dittmer, Michael Roberts, Julian Gilbey, Ander Biguri,, AIX-COVNET Collaboration, Jacobus Preller, James H.F. Rudd, John A.D. Aston,, Carola-Bibiane Sch\"onlieb

TL;DR
This paper discusses the increasing complexity of developing trustworthy data science systems, highlighting systemic issues and proposing incremental development and feedback loops as key philosophies to improve system reliability and reproducibility.
Contribution
It introduces a development philosophy emphasizing incremental growth and dual feedback loops to address challenges in building complex data systems.
Findings
Incremental development improves system reliability.
Dual feedback loops enhance correctness and efficacy.
Applying software engineering principles benefits data science systems.
Abstract
In this perspective, we argue that despite the democratization of powerful tools for data science and machine learning over the last decade, developing the code for a trustworthy and effective data science system (DSS) is getting harder. Perverse incentives and a lack of widespread software engineering (SE) skills are among many root causes we identify that naturally give rise to the current systemic crisis in reproducibility of DSSs. We analyze why SE and building large complex systems is, in general, hard. Based on these insights, we identify how SE addresses those difficulties and how we can apply and generalize SE methods to construct DSSs that are fit for purpose. We advocate two key development philosophies, namely that one should incrementally grow -- not biphasically plan and build -- DSSs, and one should always employ two types of feedback loops during development: one which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
