Evaluating Supervised Machine Learning Models: Principles, Pitfalls, and Metric Selection
Xuanyan Liu, Ignacio Cabrera Martin, Marcello Trovati, Xiaolong Xu, Nikolaos Polatidis

TL;DR
This paper critically examines principles, challenges, and best practices in evaluating supervised machine learning models, emphasizing the importance of appropriate metrics, validation strategies, and context-aware assessment for reliable performance.
Contribution
It offers a structured framework for model evaluation, highlighting common pitfalls and proposing guidelines for selecting metrics and validation methods aligned with real-world objectives.
Findings
Evaluation outcomes are heavily influenced by dataset characteristics and validation design.
Common pitfalls include the accuracy paradox, data leakage, and inappropriate metric choice.
Aligning evaluation with operational objectives enhances model trustworthiness.
Abstract
The evaluation of supervised machine learning models is a critical stage in the development of reliable predictive systems. Despite the widespread availability of machine learning libraries and automated workflows, model assessment is often reduced to the reporting of a small set of aggregate metrics, which can lead to misleading conclusions about real-world performance. This paper examines the principles, challenges, and practical considerations involved in evaluating supervised learning algorithms across classification and regression tasks. In particular, it discusses how evaluation outcomes are influenced by dataset characteristics, validation design, class imbalance, asymmetric error costs, and the choice of performance metrics. Through a series of controlled experimental scenarios using diverse benchmark datasets, the study highlights common pitfalls such as the accuracy paradox,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
