Towards better data discovery and collection with flow-based programming
Andrei Paleyes, Christian Cabrera, Neil D. Lawrence

TL;DR
This paper investigates how flow-based programming can improve data discovery and collection for machine learning deployment, highlighting its potential to enhance data-centric infrastructure compared to service-oriented paradigms.
Contribution
It introduces the application of flow-based programming to data management in ML deployment and compares its benefits with traditional service-oriented approaches.
Findings
FBP offers significant data-centric benefits for ML deployment.
Compared to service-oriented paradigms, FBP simplifies data discovery and collection.
The study highlights a trend prioritizing model development over data quality.
Abstract
Despite huge successes reported by the field of machine learning, such as voice assistants or self-driving cars, businesses still observe very high failure rate when it comes to deployment of ML in production. We argue that part of the reason is infrastructure that was not designed for data-oriented activities. This paper explores the potential of flow-based programming (FBP) for simplifying data discovery and collection in software systems. We compare FBP with the currently prevalent service-oriented paradigm to assess characteristics of each paradigm in the context of ML deployment. We develop a data processing application, formulate a subsequent ML deployment task, and measure the impact of the task implementation within both programming paradigms. Our main conclusion is that FBP shows great potential for providing data-centric infrastructural benefits for deployment of ML.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Data Stream Mining Techniques
