Automated Big Data Quality Assessment using Knowledge Graph Embeddings

Hadi Fadlallah; Rima Kilany; Mitri Haber; Ali Jaber

arXiv:2605.18833·cs.LG·May 21, 2026

Automated Big Data Quality Assessment using Knowledge Graph Embeddings

Hadi Fadlallah, Rima Kilany, Mitri Haber, Ali Jaber

PDF

TL;DR

This paper introduces a knowledge graph embedding-based method for automated, context-aware big data quality assessment, improving accuracy over traditional approaches.

Contribution

The paper presents a novel approach using knowledge graph embeddings to generate comprehensive, context-specific data quality assessment plans for big data.

Findings

01

Successfully predicts missing edges in knowledge graphs for data quality assessment

02

Enhances understanding of dataset context through knowledge graph integration

03

Demonstrates effectiveness on real-world radiation sensor data

Abstract

Automated data quality assessment is crucial for managing big data, but existing solutions face challenges in achieving accurate context-aware assessment. This paper presents a novel knowledge-based approach to enhance automated data quality assessment. Our approach utilizes knowledge graph embeddings to predict missing edges between the input dataset's context representation and the relevant quality rules and dimensions within a knowledge graph representing contextual data characteristics and the required quality assessment operations. We surpass conventional practices by integrating diverse representations within the knowledge graph, drawing insights from contextual information from a thorough literature investigation. This integration allows us to develop a comprehensive and context-specific data quality assessment plan tailored to each context. Leveraging the knowledge graph…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.