Contextual Graph Embeddings: Accounting for Data Characteristics in Heterogeneous Data Integration
Yuka Haruki, Shigeru Ishikura, Kazuya Demachi, Teruaki Hayashi

TL;DR
This paper presents a contextual graph embedding method that improves data integration tasks by incorporating structural and contextual information, demonstrating robustness across datasets with diverse properties and highlighting the influence of data characteristics.
Contribution
The study introduces a novel contextual graph embedding technique that accounts for dataset characteristics, enhancing the effectiveness of schema matching and entity resolution in heterogeneous data integration.
Findings
Outperforms existing graph-based methods across various datasets.
Contextual embeddings improve matching reliability.
Dataset characteristics significantly influence integration outcomes.
Abstract
As organizations continue to access diverse datasets, the demand for effective data integration has increased. Key tasks in this process, such as schema matching and entity resolution, are essential but often require significant effort. Although previous studies have aimed to automate these tasks, the influence of dataset characteristics on the matching effectiveness has not been thoroughly examined, and combinations of different methods remain limited. This study introduces a contextual graph embedding technique that integrates structural details from tabular data and contextual elements such as column descriptions and external knowledge. Tests conducted on datasets with varying properties such as domain specificity, data size, missing rate, and overlap rate showed that our approach consistently surpassed existing graph-based methods, especially in difficult scenarios such those with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Advanced Graph Neural Networks
