Evaluation Gaps in Machine Learning Practice

Ben Hutchinson; Negar Rostamzadeh; Christina Greer; Katherine Heller,; Vinodkumar Prabhakaran

arXiv:2205.05256·cs.LG·May 12, 2022·1 cites

Evaluation Gaps in Machine Learning Practice

Ben Hutchinson, Negar Rostamzadeh, Christina Greer, Katherine Heller,, Vinodkumar Prabhakaran

PDF

Open Access

TL;DR

This paper investigates the limited scope of current ML evaluation practices, highlighting the neglect of important contextual and normative factors, and advocates for more comprehensive, context-aware evaluation methods to ensure responsible ML deployment.

Contribution

It empirically analyzes evaluation practices in top ML conferences, revealing implicit normative assumptions and proposing the need for more contextualized evaluation methodologies.

Findings

01

Focus on narrow evaluation metrics in CV and NLP

02

Neglect of contextual and normative properties in evaluations

03

Implicit commitments like consequentialism and quantifiability influence evaluation choices

Abstract

Forming a reliable judgement of a machine learning (ML) model's appropriateness for an application ecosystem is critical for its responsible use, and requires considering a broad range of factors including harms, benefits, and responsibilities. In practice, however, evaluations of ML models frequently focus on only a narrow range of decontextualized predictive behaviours. We examine the evaluation gaps between the idealized breadth of evaluation concerns and the observed narrow focus of actual evaluations. Through an empirical study of papers from recent high-profile conferences in the Computer Vision and Natural Language Processing communities, we demonstrate a general focus on a handful of evaluation methods. By considering the metrics and test data distributions used in these methods, we draw attention to which properties of models are centered in the field, revealing the properties…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)