Data Quality Over Quantity: Pitfalls and Guidelines for Process Analytics
Lim C. Siang, Shams Elnawawi, Lee D. Rippon, Daniel L. O'Connor, R., Bhushan Gopaluni

TL;DR
This paper emphasizes the importance of data acquisition and preprocessing in industrial process analytics, providing best practices and practical guidelines to improve the reliability of data-driven models and control systems.
Contribution
It offers a comprehensive set of guidelines for data preparation in industrial settings, addressing a gap in detailed reporting and emphasizing practical considerations.
Findings
Preprocessing significantly impacts AI success in industry.
Best practices improve the development of reliable soft sensors.
Practical guidelines aid efficient data-driven modeling.
Abstract
A significant portion of the effort involved in advanced process control, process analytics, and machine learning involves acquiring and preparing data. Literature often emphasizes increasingly complex modelling techniques with incremental performance improvements. However, when industrial case studies are published they often lack important details on data acquisition and preparation. Although data pre-processing is unfairly maligned as trivial and technically uninteresting, in practice it has an out-sized influence on the success of real-world artificial intelligence applications. This work describes best practices for acquiring and preparing operating data to pursue data-driven modelling and control opportunities in industrial processes. We present practical considerations for pre-processing industrial time series data to inform the efficient development of reliable soft sensors that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Time Series Analysis and Forecasting · Data Stream Mining Techniques
