Local Embeddings for Relational Data Integration
Riccardo Cappuzzo, Paolo Papotti, Saravanan Thirumuruganathan

TL;DR
This paper introduces EmbDI, a framework for learning local relational data embeddings using graph-based representations and sentence derivation, improving data integration tasks like schema matching and entity resolution.
Contribution
The paper presents a novel graph-based approach for deriving effective local embeddings tailored for relational data integration, addressing limitations of previous NLP-inspired methods.
Findings
EmbDI outperforms baseline methods in schema matching tasks.
EmbDI achieves high accuracy in entity resolution in both supervised and unsupervised settings.
The proposed embeddings effectively capture relational context, enhancing data integration quality.
Abstract
Deep learning based techniques have been recently used with promising results for data integration problems. Some methods directly use pre-trained embeddings that were trained on a large corpus such as Wikipedia. However, they may not always be an appropriate choice for enterprise datasets with custom vocabulary. Other methods adapt techniques from natural language processing to obtain embeddings for the enterprise's relational data. However, this approach blindly treats a tuple as a sentence, thus losing a large amount of contextual information present in the tuple. We propose algorithms for obtaining local embeddings that are effective for data integration tasks on relational databases. We make four major contributions. First, we describe a compact graph-based representation that allows the specification of a rich set of relationships inherent in the relational world. Second, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
