End-to-End Data Quality-Driven Framework for Machine Learning in Production Environment
Firas Bayram, Bestoun S. Ahmed, Erik Hallin

TL;DR
This paper presents an integrated, real-time data quality framework for machine learning in production, improving model performance and reducing latency in industrial settings by combining drift detection, adaptive metrics, and MLOps.
Contribution
It introduces a novel end-to-end framework that unifies data quality assessment with ML operations, enhancing efficiency and practical deployment in industrial environments.
Findings
12% improvement in model performance (R2=94%)
Fourfold reduction in prediction latency
Effective balancing of data quality thresholds and predictive accuracy
Abstract
This paper introduces a novel end-to-end framework that efficiently integrates data quality assessment with machine learning (ML) model operations in real-time production environments. While existing approaches treat data quality assessment and ML systems as isolated processes, our framework addresses the critical gap between theoretical methods and practical implementation by combining dynamic drift detection, adaptive data quality metrics, and MLOps into a cohesive, lightweight system. The key innovation lies in its operational efficiency, enabling real-time, quality-driven ML decision-making with minimal computational overhead. We validate the framework in a steel manufacturing company's Electroslag Remelting (ESR) vacuum pumping process, demonstrating a 12% improvement in model performance (R2 = 94%) and a fourfold reduction in prediction latency. By exploring the impact of data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Stream Mining Techniques · Machine Learning and Data Classification
