Accelerating Transfer Learning with Near-Data Computation on Cloud   Object Stores

Diana Petrescu; Arsany Guirguis; Do Le Quoc; Javier Picorel; Rachid; Guerraoui; Florin Dinu

arXiv:2210.08650·cs.LG·November 4, 2024

Accelerating Transfer Learning with Near-Data Computation on Cloud Object Stores

Diana Petrescu, Arsany Guirguis, Do Le Quoc, Javier Picorel, Rachid, Guerraoui, Florin Dinu

PDF

Open Access 1 Repo

TL;DR

This paper introduces HAPI, a system that accelerates transfer learning by leveraging near-data computation in cloud object stores, effectively reducing network bottlenecks and improving training speed.

Contribution

HAPI presents a novel approach to transfer learning that combines storage-side computation and optimized execution splitting to enhance performance and resource efficiency.

Findings

01

Achieves up to 2.5x training speed-up.

02

Selects near-optimal computation split points in 86.8% of cases.

03

Improves total transfer learning time by overlapping training iterations.

Abstract

Storage disaggregation underlies today's cloud and is naturally complemented by pushing down some computation to storage, thus mitigating the potential network bottleneck between the storage and compute tiers. We show how ML training benefits from storage pushdowns by focusing on transfer learning (TL), the widespread technique that democratizes ML by reusing existing knowledge on related tasks. We propose HAPI, a new TL processing system centered around two complementary techniques that address challenges introduced by disaggregation. First, applications must carefully balance execution across tiers for performance. HAPI judiciously splits the TL computation during the feature extraction phase yielding pushdowns that not only improve network time but also improve total TL training time by overlapping the execution of consecutive training iterations across tiers. Second, operators want…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aguirguis/collabml_client
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Privacy-Preserving Technologies in Data

MethodsAttention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Convolution · Residual Block · Kaiming Initialization · Dense Connections · Max Pooling · Linear Layer