You Do Not Need a Bigger Boat: Recommendations at Reasonable Scale in a   (Mostly) Serverless and Open Stack

Jacopo Tagliabue

arXiv:2107.07346·cs.LG·July 16, 2021

You Do Not Need a Bigger Boat: Recommendations at Reasonable Scale in a (Mostly) Serverless and Open Stack

Jacopo Tagliabue

PDF

2 Repos

TL;DR

This paper presents a serverless, open-source data pipeline template for machine learning at a reasonable scale, enabling industry practitioners to leverage recommender system research without extensive infrastructure.

Contribution

It introduces a practical, scalable data stack that simplifies pipeline setup using serverless and open-source tools, addressing industry challenges with immature data pipelines.

Findings

01

Modern open source tools can process terabytes of data efficiently.

02

Serverless paradigms reduce infrastructure complexity.

03

The proposed template improves data pipeline reliability and scalability.

Abstract

We argue that immature data pipelines are preventing a large portion of industry practitioners from leveraging the latest research on recommender systems. We propose our template data stack for machine learning at "reasonable scale", and show how many challenges are solved by embracing a serverless paradigm. Leveraging our experience, we detail how modern open source can provide a pipeline processing terabytes of data with limited infrastructure work.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.