Recall, Robustness, and Lexicographic Evaluation
Fernando Diaz, Michael D. Ekstrand, Bhaskar Mitra

TL;DR
This paper provides a formal analysis of recall in ranking systems, introduces a new evaluation method called lexirecall, and demonstrates its advantages in sensitivity, stability, and fairness across multiple tasks.
Contribution
It offers a formal framework for understanding recall, connects it to robustness and fairness, and proposes lexirecall as a practical, preference-based evaluation method.
Findings
Lexirecall correlates with existing recall metrics.
Lexirecall shows higher sensitivity and stability.
The approach enhances understanding of recall in fairness contexts.
Abstract
Although originally developed to evaluate sets of items, recall is often used to evaluate rankings of items, including those produced by recommender, retrieval, and other machine learning systems. The application of recall without a formal evaluative motivation has led to criticism of recall as a vague or inappropriate measure. In light of this debate, we reflect on the measurement of recall in rankings from a formal perspective. Our analysis is composed of three tenets: recall, robustness, and lexicographic evaluation. First, we formally define `recall-orientation' as the sensitivity of a metric to a user interested in finding every relevant item. Second, we analyze recall-orientation from the perspective of robustness with respect to possible content consumers and providers, connecting recall to recent conversations about fair ranking. Finally, we extend this conceptual and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research
