Design choice and machine learning model performances

Rosa Arboretti; Riccardo Ceccato; Luca Pegoraro; Luigi Salmaso

arXiv:2201.10239·stat.ML·May 20, 2022

Design choice and machine learning model performances

Rosa Arboretti, Riccardo Ceccato, Luca Pegoraro, Luigi Salmaso

PDF

TL;DR

This paper investigates how the choice of experimental design influences machine learning model performance in industrial data analysis, providing practical guidelines for practitioners.

Contribution

It offers a comprehensive study comparing 12 experimental designs and 7 ML model families across various test functions and noise conditions, filling a gap in design-model selection guidance.

Findings

01

Design choice significantly affects ML model accuracy.

02

Certain designs outperform others depending on noise and function complexity.

03

Guidelines for selecting design-model combinations are proposed.

Abstract

An increasing number of publications present the joint application of Design of Experiments (DOE) and machine learning (ML) as a methodology to collect and analyze data on a specific industrial phenomenon. However, the literature shows that the choice of the design for data collection and model for data analysis is often not driven by statistical or algorithmic advantages, thus there is a lack of studies which provide guidelines on what designs and ML models to jointly use for data collection and analysis. This article discusses the choice of design in relation to the ML model performances. A study is conducted that considers 12 experimental designs, 7 families of predictive models, 7 test functions that emulate physical processes, and 8 noise settings, both homoscedastic and heteroscedastic. The results of the research can have an immediate impact on the work of practitioners,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.