Time to Split: Exploring Data Splitting Strategies for Offline Evaluation of Sequential Recommenders
Danil Gusak, Anna Volodkevich, Anton Klenitskiy, Alexey Vasilev, Evgeny Frolov

TL;DR
This paper investigates various data splitting strategies for evaluating sequential recommender systems, highlighting their impact on model performance and emphasizing the need for more realistic, reproducible evaluation protocols.
Contribution
It systematically compares different data splitting methods for sequential recommendation evaluation, revealing their influence on model rankings and proposing guidelines for better practices.
Findings
Evaluation outcomes vary significantly across splitting strategies.
Prevalent splits like leave-one-out may not reflect real-world scenarios.
Global temporal splitting offers more realistic evaluation but requires careful implementation.
Abstract
Modern sequential recommender systems, ranging from lightweight transformer-based variants to large language models, have become increasingly prominent in academia and industry due to their strong performance in the next-item prediction task. Yet common evaluation protocols for sequential recommendations remain insufficiently developed: they often fail to reflect the corresponding recommendation task accurately, or are not aligned with real-world scenarios. Although the widely used leave-one-out split matches next-item prediction, it permits the overlap between training and test periods, which leads to temporal leakage and unrealistically long test horizon, limiting real-world relevance. Global temporal splitting addresses these issues by evaluating on distinct future periods. However, its applications to sequential recommendations remain loosely defined, particularly in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
