Scaling Session-Based Transformer Recommendations using Optimized   Negative Sampling and Loss Functions

Timo Wilm; Philipp Normann; Sophie Baumeister; Paul-Vincent Kobow

arXiv:2307.14906·cs.IR·April 1, 2025

Scaling Session-Based Transformer Recommendations using Optimized Negative Sampling and Loss Functions

Timo Wilm, Philipp Normann, Sophie Baumeister, Paul-Vincent Kobow

PDF

1 Repo

TL;DR

TRON is a scalable session-based Transformer recommender that uses optimized negative sampling and listwise loss functions to improve recommendation accuracy and training efficiency, demonstrated by large-scale e-commerce dataset evaluations and an A/B test.

Contribution

This paper introduces TRON, a novel Transformer-based recommender that incorporates top-k negative sampling and listwise loss functions for enhanced scalability and performance.

Findings

01

TRON outperforms existing models in recommendation quality.

02

TRON maintains training speeds comparable to SASRec.

03

A/B testing shows an 18.14% increase in click-through rate.

Abstract

This work introduces TRON, a scalable session-based Transformer Recommender using Optimized Negative-sampling. Motivated by the scalability and performance limitations of prevailing models such as SASRec and GRU4Rec+, TRON integrates top-k negative sampling and listwise loss functions to enhance its recommendation accuracy. Evaluations on relevant large-scale e-commerce datasets show that TRON improves upon the recommendation quality of current methods while maintaining training speeds similar to SASRec. A live A/B test yielded an 18.14% increase in click-through rate over SASRec, highlighting the potential of TRON in practical settings. For further research, we provide access to our source code at https://github.com/otto-de/TRON and an anonymized dataset at https://github.com/otto-de/recsys-dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

otto-de/tron
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Byte Pair Encoding · Linear Layer · Softmax · Layer Normalization · Dense Connections · Dropout · Position-Wise Feed-Forward Layer · Adam