Koalja: from Data Plumbing to Smart Workspaces in the Extended Cloud
Mark Burgess, Ewout Prangsma

TL;DR
Koalja is a Kubernetes-based platform that simplifies data pipeline development, ensuring transparency, provenance tracking, and energy-efficient processing for scalable, sustainable data workflows in cloud and edge environments.
Contribution
It introduces a user-friendly, serverless data wiring platform with comprehensive provenance, optimized for sustainability and edge computing integration.
Findings
Enables transparent, serverless data pipeline development on Kubernetes.
Provides full provenance and forensic reconstruction of data processes.
Supports energy-efficient data processing and scaling for edge and IoT applications.
Abstract
Koalja describes a generalized data wiring or `pipeline' platform, built on top of Kubernetes, for plugin user code. Koalja makes the Kubernetes underlay transparent to users (for a `serverless' experience), and offers a breadboarding experience for development of data sharing circuitry, to commoditize its gradual promotion to a production system, with a minimum of infrastructure knowledge. Enterprise grade metadata are captured as data payloads flow through the circuitry, allowing full tracing of provenance and forensic reconstruction of transactional processes, down to the versions of software that led to each outcome. Koalja attends to optimizations for avoiding unwanted processing and transportation of data, that are rapidly becoming sustainability imperatives. Thus one can minimize energy expenditure and waste, and design with scaling in mind, especially with regard to edge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Blockchain Technology Applications and Security · Scientific Computing and Data Management
