TL;DR
This paper presents a serverless, open-source data pipeline template for machine learning at a reasonable scale, enabling industry practitioners to leverage recommender system research without extensive infrastructure.
Contribution
It introduces a practical, scalable data stack that simplifies pipeline setup using serverless and open-source tools, addressing industry challenges with immature data pipelines.
Findings
Modern open source tools can process terabytes of data efficiently.
Serverless paradigms reduce infrastructure complexity.
The proposed template improves data pipeline reliability and scalability.
Abstract
We argue that immature data pipelines are preventing a large portion of industry practitioners from leveraging the latest research on recommender systems. We propose our template data stack for machine learning at "reasonable scale", and show how many challenges are solved by embracing a serverless paradigm. Leveraging our experience, we detail how modern open source can provide a pipeline processing terabytes of data with limited infrastructure work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
