Statistical Inference: The Missing Piece of RecSys Experiment Reliability Discourse
Ngozi Ihemelandu, Michael D. Ekstrand

TL;DR
This paper emphasizes the importance of incorporating statistical inference into recommender system evaluation, highlighting current gaps and challenges through systematic review and advocating for empirical research to improve evaluation reliability.
Contribution
It identifies the lack of focus on statistical inference in RecSys evaluation and calls for empirical studies to develop appropriate inference techniques.
Findings
Current RecSys papers underuse statistical inference
Survey of statistical inference use in information retrieval
Identification of challenges in applying inference to RecSys
Abstract
This paper calls attention to the missing component of the recommender system evaluation process: Statistical Inference. There is active research in several components of the recommender system evaluation process: selecting baselines, standardizing benchmarks, and target item sampling. However, there has not yet been significant work on the role and use of statistical inference for analyzing recommender system evaluation results. In this paper, we argue that the use of statistical inference is a key component of the evaluation process that has not been given sufficient attention. We support this argument with systematic review of recent RecSys papers to understand how statistical inference is currently being used, along with a brief survey of studies that have been done on the use of statistical inference in the information retrieval community. We present several challenges that exist…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Advanced Multi-Objective Optimization Algorithms
