TL;DR
This study compares off-line and on-line evaluation metrics for small e-commerce recommender systems, revealing their correlations and limitations, and proposing a predictive model for offline-to-online performance translation.
Contribution
It provides an extensive analysis of 800 algorithms using 18 metrics and develops a model to predict online performance from offline results in small e-commerce contexts.
Findings
Off-line metrics show high variance and partial Pareto optimality.
On-line results are influenced by user novelty and browsing behavior.
Ranking metrics correlate positively with online success for novice users.
Abstract
In this paper, we present our work towards comparing on-line and off-line evaluation metrics in the context of small e-commerce recommender systems. Recommending on small e-commerce enterprises is rather challenging due to the lower volume of interactions and low user loyalty, rarely extending beyond a single session. On the other hand, we usually have to deal with lower volumes of objects, which are easier to discover by users through various browsing/searching GUIs. The main goal of this paper is to determine applicability of off-line evaluation metrics in learning true usability of recommender systems (evaluated on-line in A/B testing). In total 800 variants of recommending algorithms were evaluated off-line w.r.t. 18 metrics covering rating-based, ranking-based, novelty and diversity evaluation. The off-line results were afterwards compared with on-line evaluation of 12 selected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
