RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential   Recommenders

Danil Gusak; Gleb Mezentsev; Ivan Oseledets; Evgeny Frolov

arXiv:2408.02354·cs.IR·August 15, 2024

RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders

Danil Gusak, Gleb Mezentsev, Ivan Oseledets, Evgeny Frolov

PDF

1 Repo

TL;DR

This paper introduces RECE, a novel loss function that reduces GPU memory usage in large-catalogue sequential recommenders, enabling scalable training without sacrificing recommendation quality.

Contribution

RECE employs a GPU-efficient approximation technique to significantly lower memory consumption while maintaining state-of-the-art recommendation performance.

Findings

01

RECE reduces peak memory usage by up to 12 times.

02

RECE retains or exceeds the performance of full Cross-Entropy loss.

03

The method enables scalable training for large item catalogs.

Abstract

Scalability is a major challenge in modern recommender systems. In sequential recommendations, full Cross-Entropy (CE) loss achieves state-of-the-art recommendation quality but consumes excessive GPU memory with large item catalogs, limiting its practicality. Using a GPU-efficient locality-sensitive hashing-like algorithm for approximating large tensor of logits, this paper introduces a novel RECE (REduced Cross-Entropy) loss. RECE significantly reduces memory consumption while allowing one to enjoy the state-of-the-art performance of full CE loss. Experimental results on various datasets show that RECE cuts training peak memory usage by up to 12 times compared to existing methods while retaining or exceeding performance metrics of CE loss. The approach also opens up new possibilities for large-scale applications in other domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dalibra/RECE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.