Implementing CPSLint: A Data Validation and Sanitisation Tool for Industrial Cyber-Physical Systems
Uraz Odyurt, \"Omer Sayilir, Mari\"elle Stoelinga, Vadim Zaytsev

TL;DR
CPSLint is a domain-specific language and tool designed to simplify and standardize data validation and sanitization for large, unstructured time-series datasets in industrial cyber-physical systems.
Contribution
The paper introduces CPSLint, a DSL that enables both data scientists and domain experts to efficiently perform data preparation tasks for industrial CPS data.
Findings
CPSLint reduces code complexity for data sanitization tasks.
The tool improves readability and reusability of data preparation scripts.
CPSLint is publicly available and applicable to various time-series data in CPS.
Abstract
Raw datasets are often too large and unstructured to work with directly, and require a data preparation phase. The domain of industrial Cyber-Physical Systems (CPSs) is no exception, as raw data typically consists of large time-series data collections that log the system's status at regular time intervals. The processing of such raw data is often carried out using ad hoc, case-specific, one-off Python scripts, often neglecting aspects of readability, reusability, and maintainability. In practice, this can cause professionals such as data scientists to write similar data preparation scripts for each case, requiring them to do much repetitive work. We introduce CPSLint, a Domain-Specific Language (DSL) designed to support the data preparation process for industrial CPS. CPSLint raises the level of abstraction to the point where both data scientists and domain experts can perform the data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
