CubicML: Automated ML for Large ML Systems Co-design with ML Prediction   of Performance

Wei Wen; Quanyu Zhu; Weiwei Chu; Wen-Yen Chen; Jiyan Yang

arXiv:2409.04585·cs.LG·September 24, 2024

CubicML: Automated ML for Large ML Systems Co-design with ML Prediction of Performance

Wei Wen, Quanyu Zhu, Weiwei Chu, Wen-Yen Chen, Jiyan Yang

PDF

Open Access

TL;DR

CubicML introduces an ML-based approach to automatically optimize the training performance of large distributed ML systems, significantly improving training speed for models with billions of parameters.

Contribution

It presents a novel ML-driven method that predicts system performance to efficiently co-design large ML systems, addressing hyper-parameter complexity.

Findings

01

Optimized training speed for 73-billion-parameter recommendation models.

02

Achieved efficient training for 405-billion-parameter language models.

03

Demonstrated effectiveness at Meta's large-scale systems.

Abstract

Scaling up deep learning models has been proven effective to improve intelligence of machine learning (ML) models, especially for industry recommendation models and large language models. The co-design of large distributed ML systems and algorithms (to maximize training performance) plays a pivotal role for its success. As it scales, the number of co-design hyper-parameters grows rapidly which brings challenges to feasibly find the optimal setup for system performance maximization. In this paper, we propose CubicML which uses ML to automatically optimize training performance of large distributed ML systems. In CubicML, we use an ML model as a proxy to predict the training performance for search efficiency and performance modeling flexibility. We proved that CubicML can effectively optimize training speed of in-house ads recommendation models with 73 billion parameters and large language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Neural Networks and Applications · Machine Learning and Data Classification

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings