Context-Driven Data Mining through Bias Removal and Data Incompleteness Mitigation
Feras A. Batarseh, Ajay Kulkarni

TL;DR
This paper introduces the Context-Driven Data Science Lifecycle (C-DSL), which leverages context to address data quality issues like bias and incompleteness, demonstrated through sports event case studies with improved data mining results.
Contribution
The paper develops the C-DSL framework that incorporates context into the data science lifecycle to mitigate data quality problems, a novel approach not traditionally considered.
Findings
C-DSL improves data mining metrics such as R2 and confusion matrices.
Case studies show effective bias removal and data incompleteness mitigation.
Framework enhances data quality and mining outcomes.
Abstract
The results of data mining endeavors are majorly driven by data quality. Throughout these deployments, serious show-stopper problems are still unresolved, such as: data collection ambiguities, data imbalance, hidden biases in data, the lack of domain information, and data incompleteness. This paper is based on the premise that context can aid in mitigating these issues. In a traditional data science lifecycle, context is not considered. Context-driven Data Science Lifecycle (C-DSL); the main contribution of this paper, is developed to address these challenges. Two case studies (using data-sets from sports events) are developed to test C-DSL. Results from both case studies are evaluated using common data mining metrics such as: coefficient of determination (R2 value) and confusion matrices. The work presented in this paper aims to re-define the lifecycle and introduce tangible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Data Mining Algorithms and Applications · Imbalanced Data Classification Techniques
MethodsTest
