Open Knowledge Graphs Canonicalization using Variational Autoencoders

Sarthak Dash; Gaetano Rossiello; Nandana Mihindukulasooriya; Sugato; Bagchi; Alfio Gliozzo

arXiv:2012.04780·cs.CL·September 29, 2021

Open Knowledge Graphs Canonicalization using Variational Autoencoders

Sarthak Dash, Gaetano Rossiello, Nandana Mihindukulasooriya, Sugato, Bagchi, Alfio Gliozzo

PDF

1 Repo

TL;DR

This paper introduces CUVA, an end-to-end variational autoencoder model for canonicalizing noun and relation phrases in open knowledge graphs, improving over existing methods and providing a new dataset for evaluation.

Contribution

Proposes CUVA, a joint variational autoencoder model that learns embeddings and cluster assignments simultaneously for better canonicalization.

Findings

01

CUVA outperforms state-of-the-art methods on multiple benchmarks.

02

Introduces CanonicNell, a new dataset for entity canonicalization evaluation.

03

End-to-end learning improves embedding quality and clustering accuracy.

Abstract

Noun phrases and Relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to solve this problem take a two-step approach. First, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features. In this work, we propose Canonicalizing Using Variational Autoencoders (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach, which leads to a better vector representation for the noun and relation phrases. Our evaluation over multiple benchmarks shows that CUVA outperforms the existing state-of-the-art approaches. Moreover, we introduce CanonicNell, a novel dataset to evaluate entity canonicalization systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IBM/Open-KG-canonicalization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.