Have we been Naive to Select Machine Learning Models? Noisy Data are   here to Stay!

Felipe Costa Farias; Teresa Bernarda Ludermir; Carmelo Jos\'e; Albanez Bastos-Filho

arXiv:2207.06651·cs.LG·July 15, 2022

Have we been Naive to Select Machine Learning Models? Noisy Data are here to Stay!

Felipe Costa Farias, Teresa Bernarda Ludermir, Carmelo Jos\'e, Albanez Bastos-Filho

PDF

Open Access

TL;DR

This paper critiques traditional model selection methods that rely on single metrics, highlighting their naivety and the impact of noisy data, and proposes a multi-criteria approach to improve model choice.

Contribution

It introduces four theoretical optimality conditions for model selection and applies a multi-criteria decision-making algorithm to account for data noise.

Findings

01

Traditional single-criterion selection can lead to overfitting.

02

Considering data noise improves model selection robustness.

03

Multi-criteria approach yields more reasonable model choices.

Abstract

The model selection procedure is usually a single-criterion decision making in which we select the model that maximizes a specific metric in a specific set, such as the Validation set performance. We claim this is very naive and can perform poor selections of over-fitted models due to the over-searching phenomenon, which over-estimates the performance on that specific set. Futhermore, real world data contains noise that should not be ignored by the model selection procedure and must be taken into account when performing model selection. Also, we have defined four theoretical optimality conditions that we can pursue to better select the models and analyze them by using a multi-criteria decision-making algorithm (TOPSIS) that considers proxies to the optimality conditions to select reasonable models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Neural Networks and Applications · Multi-Criteria Decision Making