Badgers: generating data quality deficits with Python
Julien Siebert, Daniel Seifert, Patricia Kelbert, Michael Kl\"as, Adam, Trendowicz

TL;DR
Badgers is an open-source Python library designed to generate various data quality deficits across multiple data modalities, aiding in the experimental evaluation of AI and ML systems' robustness.
Contribution
The paper introduces badgers, a versatile tool for creating controlled data quality issues, which is extensible and supports different data types for testing AI models.
Findings
Supports multiple data modalities including tabular, time-series, and text.
Enables systematic testing of AI models against data quality deficits.
Open-source with comprehensive documentation.
Abstract
Generating context specific data quality deficits is necessary to experimentally assess data quality of data-driven (artificial intelligence (AI) or machine learning (ML)) applications. In this paper we present badgers, an extensible open-source Python library to generate data quality deficits (outliers, imbalanced data, drift, etc.) for different modalities (tabular data, time-series, text, etc.). The documentation is accessible at https://fraunhofer-iese.github.io/badgers/ and the source code at https://github.com/Fraunhofer-IESE/badgers
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Mining Algorithms and Applications · Big Data and Business Intelligence
MethodsLib
