Simple Large-scale Relation Extraction from Unstructured Text
Christos Christodoulopoulos, Arpit Mittal

TL;DR
This paper introduces a simple yet effective large-scale relation extraction method using distant supervision, demonstrating that feature design plays a crucial role and that simpler classifiers can match complex neural networks in performance.
Contribution
It presents a novel approach for generating distant supervision labels and shows that simple classifiers with well-designed features can perform as well as complex neural networks.
Findings
Simple classifiers achieve comparable performance to neural networks.
Feature design significantly impacts relation extraction accuracy.
Distant supervision effectively scales relation extraction systems.
Abstract
Knowledge-based question answering relies on the availability of facts, the majority of which cannot be found in structured sources (e.g. Wikipedia info-boxes, Wikidata). One of the major components of extracting facts from unstructured text is Relation Extraction (RE). In this paper we propose a novel method for creating distant (weak) supervision labels for training a large-scale RE system. We also provide new evidence about the effectiveness of neural network approaches by decoupling the model architecture from the feature design of a state-of-the-art neural network system. Surprisingly, a much simpler classifier trained on similar features performs on par with the highly complex neural network system (at 75x reduction to the training time), suggesting that the features are a bigger contributor to the final performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
