Testing of Machine Learning Models with Limited Samples: An Industrial Vacuum Pumping Application
Ayan Chatterjee, Bestoun S. Ahmed, Erik Hallin, Anton Engman

TL;DR
This paper addresses the challenge of limited training and testing data in industrial machine learning applications by proposing a data augmentation method and testing strategy to evaluate and improve model robustness, demonstrated on steel industry data.
Contribution
It introduces a novel data augmentation approach based on vacuum principles and a testing framework for assessing ML model robustness in industrial settings with scarce data.
Findings
Ensemble and Neural Network models are most robust with augmented data.
The proposed testing method effectively evaluates model robustness.
The approach enhances ML reliability in industrial applications.
Abstract
There is often a scarcity of training data for machine learning (ML) classification and regression models in industrial production, especially for time-consuming or sparsely run manufacturing processes. A majority of the limited ground-truth data is used for training, while a handful of samples are left for testing. Here, the number of test samples is inadequate to properly evaluate the robustness of the ML models under test for classification and regression. Furthermore, the output of these ML models may be inaccurate or even fail if the input data differ from the expected. This is the case for ML models used in the Electroslag Remelting (ESR) process in the refined steel industry to predict the pressure in a vacuum chamber. A vacuum pumping event that occurs once a workday generates a few hundred samples in a year of pumping for training and testing. In the absence of adequate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Neural Networks and Applications · Machine Learning and Data Classification
