Multi-View Clustering for Open Knowledge Base Canonicalization
Wei Shen, Yang Yang, Yinan Liu

TL;DR
This paper introduces CMVC, an unsupervised multi-view clustering framework that jointly leverages fact and context views to improve open knowledge base canonicalization, outperforming existing methods.
Contribution
The paper proposes a novel multi-view clustering approach with a data-driven cluster number prediction for better OKB canonicalization without labeled data.
Findings
Outperforms state-of-the-art methods on real-world OKB datasets.
Effectively leverages fact and context views for canonicalization.
Introduces a Log-Jump algorithm for optimal cluster number prediction.
Abstract
Open information extraction (OIE) methods extract plenty of OIE triples <noun phrase, relation phrase, noun phrase> from unstructured text, which compose large open knowledge bases (OKBs). Noun phrases and relation phrases in such OKBs are not canonicalized, which leads to scattered and redundant facts. It is found that two views of knowledge (i.e., a fact view based on the fact triple and a context view based on the fact triple's source context) provide complementary information that is vital to the task of OKB canonicalization, which clusters synonymous noun phrases and relation phrases into the same group and assigns them unique identifiers. However, these two views of knowledge have so far been leveraged in isolation by existing works. In this paper, we propose CMVC, a novel unsupervised framework that leverages these two views of knowledge jointly for canonicalizing OKBs without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques
Methodsk-Means Clustering
