Debias Can be Unreliable: Mitigating Bias Issue in Evaluating Debiasing Recommendation
Chengbing Wang, Wentao Shi, Jizhi Zhang, Wenjie Wang, Hang Pan, Fuli, Feng

TL;DR
This paper highlights the unreliability of traditional evaluation methods for debiased recommendation models on randomly-exposed datasets and introduces the Unbiased Recall Evaluation (URE) scheme to provide more accurate performance estimates.
Contribution
The paper proposes the URE scheme to unbiasedly evaluate debiased recommendation models using randomly-exposed datasets, supported by theoretical and experimental validation.
Findings
Traditional evaluation schemes are inconsistent with fully-exposed datasets.
URE provides unbiased estimates of Recall on randomly-exposed datasets.
Experiments confirm URE's effectiveness in real-world scenarios.
Abstract
Recent work has improved recommendation models remarkably by equipping them with debiasing methods. Due to the unavailability of fully-exposed datasets, most existing approaches resort to randomly-exposed datasets as a proxy for evaluating debiased models, employing traditional evaluation scheme to represent the recommendation performance. However, in this study, we reveal that traditional evaluation scheme is not suitable for randomly-exposed datasets, leading to inconsistency between the Recall performance obtained using randomly-exposed datasets and that obtained using fully-exposed datasets. Such inconsistency indicates the potential unreliability of experiment conclusions on previous debiasing techniques and calls for unbiased Recall evaluation using randomly-exposed datasets. To bridge the gap, we propose the Unbiased Recall Evaluation (URE) scheme, which adjusts the utilization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDecision-Making and Behavioral Economics · Psychology of Social Influence
