Have we been Naive to Select Machine Learning Models? Noisy Data are here to Stay!
Felipe Costa Farias, Teresa Bernarda Ludermir, Carmelo Jos\'e, Albanez Bastos-Filho

TL;DR
This paper critiques traditional model selection methods that rely on single metrics, highlighting their naivety and the impact of noisy data, and proposes a multi-criteria approach to improve model choice.
Contribution
It introduces four theoretical optimality conditions for model selection and applies a multi-criteria decision-making algorithm to account for data noise.
Findings
Traditional single-criterion selection can lead to overfitting.
Considering data noise improves model selection robustness.
Multi-criteria approach yields more reasonable model choices.
Abstract
The model selection procedure is usually a single-criterion decision making in which we select the model that maximizes a specific metric in a specific set, such as the Validation set performance. We claim this is very naive and can perform poor selections of over-fitted models due to the over-searching phenomenon, which over-estimates the performance on that specific set. Futhermore, real world data contains noise that should not be ignored by the model selection procedure and must be taken into account when performing model selection. Also, we have defined four theoretical optimality conditions that we can pursue to better select the models and analyze them by using a multi-criteria decision-making algorithm (TOPSIS) that considers proxies to the optimality conditions to select reasonable models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Neural Networks and Applications · Multi-Criteria Decision Making
