It's Time to Consider "Time" when Evaluating Recommender-System Algorithms [Proposal]
Joeran Beel

TL;DR
This paper advocates for evaluating recommender systems using time-series metrics rather than single static numbers, to better understand how algorithm performance evolves over time.
Contribution
It introduces the idea of using time-based evaluation metrics and visualizations for recommender systems, challenging the traditional static single-number approach.
Findings
Time-series metrics reveal performance trends over time.
Static metrics may obscure temporal variations in effectiveness.
Proposed approach enables better future performance predictions.
Abstract
In this position paper, we question the current practice of calculating evaluation metrics for recommender systems as single numbers (e.g. precision p=.28 or mean absolute error MAE = 1.21). We argue that single numbers express only average effectiveness over a usually rather long period (e.g. a year or even longer), which provides only a vague and static view of the data. We propose that recommender-system researchers should instead calculate metrics for time-series such as weeks or months, and plot the results in e.g. a line chart. This way, results show how algorithms' effectiveness develops over time, and hence the results allow drawing more meaningful conclusions about how an algorithm will perform in the future. In this paper, we explain our reasoning, provide an example to illustrate our reasoning and present suggestions for what the community should do next.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
