Good practices for evaluation of machine learning systems

Luciana Ferrer; Odette Scharenborg; Tom B\"ackstr\"om

arXiv:2412.03700·cs.LG·December 6, 2024

Good practices for evaluation of machine learning systems

Luciana Ferrer, Odette Scharenborg, Tom B\"ackstr\"om

PDF

Open Access 2 Repos

TL;DR

This paper emphasizes the importance of carefully designing evaluation protocols in machine learning, covering data, metrics, and significance to ensure reliable and generalizable results.

Contribution

It provides guidelines and examples for designing effective evaluation procedures in ML, highlighting common pitfalls and best practices.

Findings

01

Proper evaluation design prevents misleading conclusions

02

Careful data and metric selection improves result reliability

03

Statistical significance assessment is crucial for valid comparisons

Abstract

Many development decisions affect the results obtained from ML experiments: training data, features, model architecture, hyperparameters, test data, etc. Among these aspects, arguably the most important design decisions are those that involve the evaluation procedure. This procedure is what determines whether the conclusions drawn from the experiments will or will not generalize to unseen data and whether they will be relevant to the application of interest. If the data is incorrectly selected, the wrong metric is chosen for evaluation or the significance of the comparisons between models is overestimated, conclusions may be misleading or result in suboptimal development decisions. To avoid such problems, the evaluation protocol should be very carefully designed before experimentation starts. In this work we discuss the main aspects involved in the design of the evaluation protocol:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Processing Techniques · Neural Networks and Applications