CPSLint: A Domain-Specific Language Providing Data Validation and Sanitisation for Industrial Cyber-Physical Systems
Uraz Odyurt, \"Omer Sayilir, Mari\"elle Stoelinga, Vadim Zaytsev

TL;DR
CPSLint is a domain-specific language designed to automate data validation and sanitisation in industrial cyber-physical systems, improving data quality for machine learning applications.
Contribution
The paper introduces CPSLint, a novel language that enables non-programming experts to efficiently validate and correct time-series data in industrial settings.
Findings
CPSLint effectively detects and corrects common data corruption patterns.
Evaluation shows reduced manual effort and consistent data sanitisation.
The approach is applicable across various time-series datasets.
Abstract
Industrial cyber-physical systems generate vast amounts of semi-structured time-series data that require careful preprocessing before they can be effectively used for machine learning applications such as fault detection and identification. Raw sensor datasets are often corrupted or incomplete, making it challenging to develop reliable solutions without proper data preparation and validation. In this paper, we introduce CPSLint, a domain-specific language for data validation and sanitisation. We present the design, implementation and evaluation of CPSLint, demonstrating its ability to automatically detect and correct common data corruption patterns while enabling non-programming domain experts to effectively prepare their data for analysis. We report evaluation results on a representative dataset, tracking memory consumption and CPU-time for sanitisation activities. Our approach offers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
