TL;DR
This paper introduces CUVA, an end-to-end variational autoencoder model for canonicalizing noun and relation phrases in open knowledge graphs, improving over existing methods and providing a new dataset for evaluation.
Contribution
Proposes CUVA, a joint variational autoencoder model that learns embeddings and cluster assignments simultaneously for better canonicalization.
Findings
CUVA outperforms state-of-the-art methods on multiple benchmarks.
Introduces CanonicNell, a new dataset for entity canonicalization evaluation.
End-to-end learning improves embedding quality and clustering accuracy.
Abstract
Noun phrases and Relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to solve this problem take a two-step approach. First, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features. In this work, we propose Canonicalizing Using Variational Autoencoders (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach, which leads to a better vector representation for the noun and relation phrases. Our evaluation over multiple benchmarks shows that CUVA outperforms the existing state-of-the-art approaches. Moreover, we introduce CanonicNell, a novel dataset to evaluate entity canonicalization systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
