CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases
Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek, F. Abdelzaher, Jiawei Han

TL;DR
CoType is a domain-independent framework that jointly extracts typed entities and relations from text using distant supervision, embedding them into low-dimensional spaces to improve extraction accuracy across various domains.
Contribution
The paper introduces CoType, a novel joint extraction framework that handles noisy distant supervision data through a partial-label loss and cross-constraints, advancing entity and relation extraction methods.
Findings
Achieved an average of 25% improvement in F1 score over existing methods.
Demonstrated effectiveness across multiple domains including news and biomedical.
Utilized a novel embedding and optimization approach for joint extraction.
Abstract
Extracting entities and relations for types of interest from text is important for understanding massive text corpora. Traditionally, systems of entity relation extraction have relied on human-annotated corpora for training and adopted an incremental pipeline. Such systems require additional human expertise to be ported to a new domain, and are vulnerable to errors cascading down the pipeline. In this paper, we investigate joint extraction of typed entities and relations with labeled data heuristically obtained from knowledge bases (i.e., distant supervision). As our algorithm for type labeling via distant supervision is context-agnostic, noisy training data poses unique challenges for the task. We propose a novel domain-independent framework, called CoType, that runs a data-driven text segmentation algorithm to extract entity mentions, and jointly embeds entity mentions, relation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
