DataLens: ML-Oriented Interactive Tabular Data Quality Dashboard
Mohamed Abdelaal, Samuel Lokadjaja, Arne Kreuz, Harald Sch\"oning

TL;DR
DataLens is an interactive dashboard that automates and enhances tabular data quality management by integrating profiling, error detection, and ML-based repair tools with user-in-the-loop capabilities for improved ML workflows.
Contribution
It introduces a comprehensive, interactive platform that combines automation, user guidance, and integration with ML workflows for data cleaning and quality assurance.
Findings
Effective identification and correction of data errors.
Improved data quality for downstream ML tasks.
Enhanced reproducibility with metadata and version control.
Abstract
Maintaining high data quality is crucial for reliable data analysis and machine learning (ML). However, existing data quality management tools often lack automation, interactivity, and integration with ML workflows. This demonstration paper introduces DataLens, a novel interactive dashboard designed to streamline and automate the data quality management process for tabular data. DataLens integrates a suite of data profiling, error detection, and repair tools, including statistical, rule-based, and ML-based methods. It features a user-in-the-loop module for interactive rule validation, data labeling, and custom rule definition, enabling domain experts to guide the cleaning process. Furthermore, DataLens implements an iterative cleaning module that automatically selects optimal cleaning tools based on downstream ML model performance. To ensure reproducibility, DataLens generates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Data Quality and Management
