# CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side   Information

**Authors:** Shikhar Vashishth, Prince Jain, Partha Talukdar

arXiv: 1902.00172 · 2019-02-04

## TL;DR

CESI is a novel method that uses learned embeddings and side information to canonicalize open knowledge bases, reducing redundancy and ambiguity in extracted facts.

## Contribution

CESI introduces a new embedding-based approach that incorporates side information for canonicalization, overcoming manual feature engineering limitations.

## Key findings

- CESI outperforms existing clustering methods on real-world datasets.
- Embedding-based canonicalization improves accuracy and reduces ambiguity.
- CESI effectively leverages side information to enhance knowledge base quality.

## Abstract

Open Information Extraction (OpenIE) methods extract (noun phrase, relation phrase, noun phrase) triples from text, resulting in the construction of large Open Knowledge Bases (Open KBs). The noun phrases (NPs) and relation phrases in such Open KBs are not canonicalized, leading to the storage of redundant and ambiguous facts. Recent research has posed canonicalization of Open KBs as clustering over manuallydefined feature spaces. Manual feature engineering is expensive and often sub-optimal. In order to overcome this challenge, we propose Canonicalization using Embeddings and Side Information (CESI) - a novel approach which performs canonicalization over learned embeddings of Open KBs. CESI extends recent advances in KB embedding by incorporating relevant NP and relation phrase side information in a principled manner. Through extensive experiments on multiple real-world datasets, we demonstrate CESI's effectiveness.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.00172/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1902.00172/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/1902.00172/full.md

---
Source: https://tomesphere.com/paper/1902.00172