SplitLight: An Exploratory Toolkit for Recommender Systems Datasets and Splits
Anna Volodkevich, Dmitry Anikin, Danil Gusak, Anton Klenitskiy, Evgeny Frolov, Alexey Vasilev

TL;DR
SplitLight is an open-source toolkit designed to analyze, compare, and improve the reproducibility of data splitting and preprocessing strategies in recommender system evaluations, addressing hidden biases and inconsistencies.
Contribution
It introduces a comprehensive toolkit that enables measurement, comparison, and diagnosis of data splits and preprocessing choices in recommender system datasets.
Findings
Detects temporal leakage and cold-start issues
Provides visual comparison of splitting strategies
Supports transparent and reproducible evaluation protocols
Abstract
Offline evaluation of recommender systems is often affected by hidden, under-documented choices in data preparation. Seemingly minor decisions in filtering, handling repeats, cold-start treatment, and splitting strategy design can substantially reorder model rankings and undermine reproducibility and cross-paper comparability. In this paper, we introduce SplitLight, an open-source exploratory toolkit that enables researchers and practitioners designing preprocessing and splitting pipelines or reviewing external artifacts to make these decisions measurable, comparable, and reportable. Given an interaction log and derived split subsets, SplitLight analyzes core and temporal dataset statistics, characterizes repeat consumption patterns and timestamp anomalies, and diagnoses split validity, including temporal leakage, cold-user/item exposure, and distribution shifts. SplitLight further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Explainable Artificial Intelligence (XAI) · Mental Health via Writing
