Ilargi: a GPU Compatible Factorized ML Model Training Framework

Wenbo Sun; Rihan Hai

arXiv:2502.01985·cs.LG·February 5, 2025

Ilargi: a GPU Compatible Factorized ML Model Training Framework

Wenbo Sun, Rihan Hai

PDF

Open Access

TL;DR

Ilargi is a pioneering GPU-compatible factorized ML training framework that automates data integration and optimizes computation, significantly accelerating training across CPU and GPU environments with intelligent cost-based decision making.

Contribution

Introduces Ilargi, the first GPU-compatible factorized learning framework that automates data integration and employs ML-based cost estimation for optimized training.

Findings

01

Achieves up to 8.9x speedups on GPUs.

02

Over 20% acceleration in batch ML workloads.

03

First to enable GPU-compatible factorized learning.

Abstract

The machine learning (ML) training over disparate data sources traditionally involves materialization, which can impose substantial time and space overhead due to data movement and replication. Factorized learning, which leverages direct computation on disparate sources through linear algebra (LA) rewriting, has emerged as a viable alternative to improve computational efficiency. However, the adaptation of factorized learning to leverage the full capabilities of modern LA-friendly hardware like GPUs has been limited, often requiring manual intervention for algorithm compatibility. This paper introduces Ilargi, a novel factorized learning framework that utilizes matrix-represented data integration (DI) metadata to facilitate automatic factorization across CPU and GPU environments without the need for costly relational joins. Ilargi incorporates an ML-based cost estimator to intelligently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques