Automated Big Data Quality Assessment using Knowledge Graph Embeddings
Hadi Fadlallah, Rima Kilany, Mitri Haber, Ali Jaber

TL;DR
This paper introduces a knowledge graph embedding-based method for automated, context-aware big data quality assessment, improving accuracy over traditional approaches.
Contribution
The paper presents a novel approach using knowledge graph embeddings to generate comprehensive, context-specific data quality assessment plans for big data.
Findings
Successfully predicts missing edges in knowledge graphs for data quality assessment
Enhances understanding of dataset context through knowledge graph integration
Demonstrates effectiveness on real-world radiation sensor data
Abstract
Automated data quality assessment is crucial for managing big data, but existing solutions face challenges in achieving accurate context-aware assessment. This paper presents a novel knowledge-based approach to enhance automated data quality assessment. Our approach utilizes knowledge graph embeddings to predict missing edges between the input dataset's context representation and the relevant quality rules and dimensions within a knowledge graph representing contextual data characteristics and the required quality assessment operations. We surpass conventional practices by integrating diverse representations within the knowledge graph, drawing insights from contextual information from a thorough literature investigation. This integration allows us to develop a comprehensive and context-specific data quality assessment plan tailored to each context. Leveraging the knowledge graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
