Pessimistic Evaluation

Fernando Diaz

arXiv:2410.13680·cs.IR·October 18, 2024

Pessimistic Evaluation

Fernando Diaz

PDF

TL;DR

This paper advocates for pessimistic evaluation of information access systems, emphasizing worst-case utility to align with principles of equal access and social good, complementing traditional average-based metrics.

Contribution

It introduces a pessimistic evaluation framework grounded in ethics and theory, empirically validated across retrieval and recommendation tasks.

Findings

01

Pessimistic evaluation complements existing robustness and fairness methods.

02

Empirical validation across multiple tasks supports its practical relevance.

03

Encourages inclusion of worst-case metrics in system assessment.

Abstract

Traditional evaluation of information access systems has focused primarily on average utility across a set of information needs (information retrieval) or users (recommender systems). In this work, we argue that evaluating only with average metric measurements assumes utilitarian values not aligned with traditions of information access based on equal access. We advocate for pessimistic evaluation of information access systems focusing on worst case utility. These methods are (a) grounded in ethical and pragmatic concepts, (b) theoretically complementary to existing robustness and fairness methods, and (c) empirically validated across a set of retrieval and recommendation tasks. These results suggest that pessimistic evaluation should be included in existing experimentation processes to better understand the behavior of systems, especially when concerned with principles of social good.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training