Design choice and machine learning model performances
Rosa Arboretti, Riccardo Ceccato, Luca Pegoraro, Luigi Salmaso

TL;DR
This paper investigates how the choice of experimental design influences machine learning model performance in industrial data analysis, providing practical guidelines for practitioners.
Contribution
It offers a comprehensive study comparing 12 experimental designs and 7 ML model families across various test functions and noise conditions, filling a gap in design-model selection guidance.
Findings
Design choice significantly affects ML model accuracy.
Certain designs outperform others depending on noise and function complexity.
Guidelines for selecting design-model combinations are proposed.
Abstract
An increasing number of publications present the joint application of Design of Experiments (DOE) and machine learning (ML) as a methodology to collect and analyze data on a specific industrial phenomenon. However, the literature shows that the choice of the design for data collection and model for data analysis is often not driven by statistical or algorithmic advantages, thus there is a lack of studies which provide guidelines on what designs and ML models to jointly use for data collection and analysis. This article discusses the choice of design in relation to the ML model performances. A study is conducted that considers 12 experimental designs, 7 families of predictive models, 7 test functions that emulate physical processes, and 8 noise settings, both homoscedastic and heteroscedastic. The results of the research can have an immediate impact on the work of practitioners,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
