Putting Data Science Pipelines on the Edge

Ali Akoglu; Genoveva Vargas-Solar

arXiv:2103.07978·cs.DB·March 16, 2021

Putting Data Science Pipelines on the Edge

Ali Akoglu, Genoveva Vargas-Solar

PDF

TL;DR

This paper introduces JITA-4DS, a flexible architecture for data science pipelines that dynamically configures disaggregated data centers to meet changing workload requirements and SLOs.

Contribution

It presents a novel composable architecture and resource management techniques for disaggregated data centers tailored for data science pipelines.

Findings

01

Demonstrates dynamic assembly of pipeline components based on workload needs.

02

Models and validates large-scale disaggregated data center performance.

03

Shows improved SLO adherence through application-aware resource management.

Abstract

This paper proposes a composable "Just in Time Architecture" for Data Science (DS) Pipelines named JITA-4DS and associated resource management techniques for configuring disaggregated data centers (DCs). DCs under our approach are composable based on vertical integration of the application, middleware/operating system, and hardware layers customized dynamically to meet application Service Level Objectives (SLO - application-aware management). Thereby, pipelines utilize a set of flexible building blocks that can be dynamically and automatically assembled and re-assembled to meet the dynamic changes in the workload's SLOs. To assess disaggregated DC's, we study how to model and validate their performance in large-scale settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.