Metrology for AI: From Benchmarks to Instruments

Chris Welty; Praveen Paritosh; Lora Aroyo

arXiv:1911.01875·cs.AI·November 6, 2019·20 cites

Metrology for AI: From Benchmarks to Instruments

Chris Welty, Praveen Paritosh, Lora Aroyo

PDF

Open Access 1 Repo

TL;DR

This paper advocates applying metrology principles to AI system evaluation, emphasizing the importance of measurement instruments and their characteristics to improve benchmarking accuracy.

Contribution

It introduces the concept of using metrology to formalize AI evaluation, highlighting the need to report instrument precision and resolution in benchmarks.

Findings

01

Crowd-sourced datasets are commonly used as measurement instruments.

02

Current AI benchmarks often lack reporting of measurement variance.

03

Applying metrology can improve the reliability of AI performance comparisons.

Abstract

In this paper we present the first steps towards hardening the science of measuring AI systems, by adopting metrology, the science of measurement and its application, and applying it to human (crowd) powered evaluations. We begin with the intuitive observation that evaluating the performance of an AI system is a form of measurement. In all other science and engineering disciplines, the devices used to measure are called instruments, and all measurements are recorded with respect to the characteristics of the instruments used. One does not report mass, speed, or length, for example, of a studied object without disclosing the precision (measurement variance) and resolution (smallest detectable change) of the instrument used. It is extremely common in the AI literature to compare the performance of two systems by using a crowd-sourced dataset as an instrument, but failing to report if the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oana-inel/ResponsibleAIDataCollection
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Anomaly Detection Techniques and Applications · Data Stream Mining Techniques