Automated Metadata Harmonization Using Entity Resolution & Contextual Embedding
Kunal Sawarkar, Meenkakshi Kodati

TL;DR
This paper presents an automated approach for metadata harmonization in ML data curation, leveraging entity resolution and contextual embeddings to standardize heterogeneous schemas efficiently.
Contribution
It introduces a novel method combining entity resolution with Db2Vec embeddings to automate schema matching and ontological inference in metadata harmonization.
Findings
Effective schema matching using entity resolution and embeddings
Successful inference of ontological structures from source schemas
Reduction in manual effort for metadata curation
Abstract
ML Data Curation process typically consist of heterogeneous & federated source systems with varied schema structures; requiring curation process to standardize metadata from different schemas to an inter-operable schema. This manual process of Metadata Harmonization & cataloging slows efficiency of ML-Ops lifecycle. We demonstrate automation of this step with the help of entity resolution methods & also by using Cogntive Database's Db2Vec embedding approach to capture hidden inter-column & intra-column relationships which detect similarity of metadata and then predict metadata columns from source schemas to any standardized schemas. Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries · Data Quality and Management
