Accelerated Cloud for Artificial Intelligence (ACAI)
Dachi Chen, Weitian Ding, Chen Liang, Chang Xu, Junwei Zhang, Majd, Sakr

TL;DR
ACAI is a comprehensive cloud platform that streamlines machine learning workflows by automating data management, resource provisioning, and experiment tracking, significantly improving efficiency and reducing costs for ML practitioners.
Contribution
This paper introduces ACAI, a novel end-to-end cloud-based ML platform with automated resource management, data versioning, and experiment tracking, enhancing productivity over existing solutions.
Findings
Auto-provisioner achieves 1.7x speed-up on MNIST
System reduces experiment time by 20%
Cost is reduced by 39% with ACAI
Abstract
Training an effective Machine learning (ML) model is an iterative process that requires effort in multiple dimensions. Vertically, a single pipeline typically includes an initial ETL (Extract, Transform, Load) of raw datasets, a model training stage, and an evaluation stage where the practitioners obtain statistics of the model performance. Horizontally, many such pipelines may be required to find the best model within a search space of model configurations. Many practitioners resort to maintaining logs manually and writing simple glue code to automate the workflow. However, carrying out this process on the cloud is not a trivial task in terms of resource provisioning, data management, and bookkeeping of job histories to make sure the results are reproducible. We propose an end-to-end cloud-based machine learning platform, Accelerated Cloud for AI (ACAI), to help improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management
