Simple Large-scale Relation Extraction from Unstructured Text

Christos Christodoulopoulos; Arpit Mittal

arXiv:1803.09091·cs.CL·March 28, 2018

Simple Large-scale Relation Extraction from Unstructured Text

Christos Christodoulopoulos, Arpit Mittal

PDF

Open Access

TL;DR

This paper introduces a simple yet effective large-scale relation extraction method using distant supervision, demonstrating that feature design plays a crucial role and that simpler classifiers can match complex neural networks in performance.

Contribution

It presents a novel approach for generating distant supervision labels and shows that simple classifiers with well-designed features can perform as well as complex neural networks.

Findings

01

Simple classifiers achieve comparable performance to neural networks.

02

Feature design significantly impacts relation extraction accuracy.

03

Distant supervision effectively scales relation extraction systems.

Abstract

Knowledge-based question answering relies on the availability of facts, the majority of which cannot be found in structured sources (e.g. Wikipedia info-boxes, Wikidata). One of the major components of extracting facts from unstructured text is Relation Extraction (RE). In this paper we propose a novel method for creating distant (weak) supervision labels for training a large-scale RE system. We also provide new evidence about the effectiveness of neural network approaches by decoupling the model architecture from the feature design of a state-of-the-art neural network system. Surprisingly, a much simpler classifier trained on similar features performs on par with the highly complex neural network system (at 75x reduction to the training time), suggesting that the features are a bigger contributor to the final performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies