Understanding and Preparing Data of Industrial Processes for Machine Learning Applications
Philipp Fleck, Manfred K\"ugel, Michael Kommenda

TL;DR
This paper introduces a novel data preprocessing technique for industrial machine learning applications that effectively handles large proportions of missing sensor data without discarding observations, demonstrated on steel production data.
Contribution
The paper presents a new method for utilizing incomplete industrial data, reducing the need for data removal when missing values are prevalent, with adaptable implementations based on data characteristics.
Findings
Method effectively handles large missing data proportions
Application demonstrated on steel production data
Reduces data loss compared to traditional imputation or removal
Abstract
Industrial applications of machine learning face unique challenges due to the nature of raw industry data. Preprocessing and preparing raw industrial data for machine learning applications is a demanding task that often takes more time and work than the actual modeling process itself and poses additional challenges. This paper addresses one of those challenges, specifically, the challenge of missing values due to sensor unavailability at different production units of nonlinear production lines. In cases where only a small proportion of the data is missing, those missing values can often be imputed. In cases of large proportions of missing data, imputing is often not feasible, and removing observations containing missing values is often the only option. This paper presents a technique, that allows to utilize all of the available data without the need of removing large amounts of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
