PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models
Yunjae Lee, Hyeseong Kim, Minsoo Rhu

TL;DR
PreSto is a storage-centric data preprocessing system for recommendation models that significantly speeds up preprocessing, reduces costs, and improves energy efficiency by offloading operations to in-storage processing units.
Contribution
PreSto introduces a novel in-storage processing approach to overcome CPU bottlenecks in RecSys data preprocessing, enhancing speed and efficiency.
Findings
9.6× faster preprocessing time
4.3× better cost-efficiency
11.3× improved energy efficiency
Abstract
Training recommendation systems (RecSys) faces several challenges as it requires the "data preprocessing" stage to preprocess an ample amount of raw data and feed them to the GPU for training in a seamless manner. To sustain high training throughput, state-of-the-art solutions reserve a large fleet of CPU servers for preprocessing which incurs substantial deployment cost and power consumption. Our characterization reveals that prior CPU-centric preprocessing is bottlenecked on feature generation and feature normalization operations as it fails to reap out the abundant inter-/intra-feature parallelism in RecSys preprocessing. PreSto is a storage-centric preprocessing system leveraging In-Storage Processing (ISP), which offloads the bottlenecked preprocessing operations to our ISP units. We show that PreSto outperforms the baseline CPU-centric system with a speedup in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Artificial Intelligence in Healthcare · AI and HR Technologies
