TL;DR
This paper proposes a scalable Cross-Entropy loss function for sequential recommendation systems with large item catalogs, significantly reducing memory usage while maintaining or improving recommendation quality.
Contribution
It introduces a novel SCE loss that approximates traditional CE efficiently, enabling large-scale recommendations without high GPU memory consumption.
Findings
Reduces peak memory usage by up to 100 times
Maintains or exceeds recommendation quality metrics
Effective for large-scale datasets and models
Abstract
Scalability issue plays a crucial role in productionizing modern recommender systems. Even lightweight architectures may suffer from high computational overload due to intermediate calculations, limiting their practicality in real-world applications. Specifically, applying full Cross-Entropy (CE) loss often yields state-of-the-art performance in terms of recommendations quality. Still, it suffers from excessive GPU memory utilization when dealing with large item catalogs. This paper introduces a novel Scalable Cross-Entropy (SCE) loss function in the sequential learning setup. It approximates the CE loss for datasets with large-size catalogs, enhancing both time efficiency and memory usage without compromising recommendations quality. Unlike traditional negative sampling methods, our approach utilizes a selective GPU-efficient computation strategy, focusing on the most informative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax
