InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep   Recommendation Models

Kabir Nagrecha; Lingyi Liu; Pablo Delgado; Prasanna Padmanabhan

arXiv:2308.08500·cs.IR·August 17, 2023

InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models

Kabir Nagrecha, Lingyi Liu, Pablo Delgado, Prasanna Padmanabhan

PDF

Open Access

TL;DR

This paper introduces InTune, a reinforcement learning-based system that optimizes data pipeline resource allocation for deep recommendation models, significantly improving data ingestion throughput and resource utilization.

Contribution

We develop InTune, a novel RL-driven data pipeline optimizer that adapts resource distribution in real-time, outperforming existing tools in efficiency and robustness.

Findings

01

InTune increases data ingestion throughput by up to 2.29x.

02

InTune reduces idle times and improves CPU & GPU utilization.

03

The system adapts quickly, optimizing pipelines within minutes.

Abstract

Deep learning-based recommender models (DLRMs) have become an essential component of many modern recommender systems. Several companies are now building large compute clusters reserved only for DLRM training, driving new interest in cost- and time- saving optimizations. The systems challenges faced in this setting are unique; while typical deep learning training jobs are dominated by model execution, the most important factor in DLRM training performance is often online data ingestion. In this paper, we explore the unique characteristics of this data ingestion problem and provide insights into DLRM training pipeline bottlenecks and challenges. We study real-world DLRM data processing pipelines taken from our compute cluster at Netflix to observe the performance impacts of online ingestion and to identify shortfalls in existing pipeline optimizers. We find that current tooling either…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Stochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data