ClusterEA: Scalable Entity Alignment with Stochastic Training and   Normalized Mini-batch Similarities

Yunjun Gao; Xiaoze Liu; Junyang Wu; Tianyi Li; Pengfei Wang; Lu Chen

arXiv:2205.10312·cs.DB·June 7, 2022

ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities

Yunjun Gao, Xiaoze Liu, Junyang Wu, Tianyi Li, Pengfei Wang, Lu Chen

PDF

2 Repos

TL;DR

ClusterEA introduces a scalable entity alignment framework that leverages stochastic training and normalized mini-batch similarities, significantly improving performance on large knowledge graphs compared to existing methods.

Contribution

The paper proposes ClusterEA, a novel scalable framework for entity alignment that combines stochastic training, a new sampling strategy, and similarity normalization to handle large-scale knowledge graphs.

Findings

01

Outperforms state-of-the-art scalable EA methods by up to 8 times in Hits@1.

02

Effectively scales to large KGs with improved alignment accuracy.

03

Demonstrates robustness and efficiency on real-world datasets.

Abstract

Entity alignment (EA) aims at finding equivalent entities in different knowledge graphs (KGs). Embedding-based approaches have dominated the EA task in recent years. Those methods face problems that come from the geometric properties of embedding vectors, including hubness and isolation. To solve these geometric problems, many normalization approaches have been adopted for EA. However, the increasing scale of KGs renders it hard for EA models to adopt the normalization processes, thus limiting their usage in real-world applications. To tackle this challenge, we present ClusterEA, a general framework that is capable of scaling up EA models and enhancing their results by leveraging normalization methods on mini-batches with a high entity equivalent rate. ClusterEA contains three components to align entities between large-scale KGs, including stochastic training, ClusterSampler, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsALIGN