Online Inference for Relation Extraction with a Reduced Feature Set

Maxim Rabinovich; C\'edric Archambeau

arXiv:1504.04770·cs.CL·April 21, 2015

Online Inference for Relation Extraction with a Reduced Feature Set

Maxim Rabinovich, C\'edric Archambeau

PDF

Open Access

TL;DR

This paper evaluates an online inference method, SSVI, for relation extraction using RelLDA, highlighting its potential and limitations for scalable knowledge base creation from large unannotated corpora.

Contribution

It provides an empirical assessment of SSVI for relation extraction, identifying its strengths and challenges in large-scale, unsupervised text analysis.

Findings

01

Online inference yields strong qualitative results.

02

Identifies pathologies in SSVI and RelLDA models.

03

Highlights scalability issues needing further research.

Abstract

Access to web-scale corpora is gradually bringing robust automatic knowledge base creation and extension within reach. To exploit these large unannotated---and extremely difficult to annotate---corpora, unsupervised machine learning methods are required. Probabilistic models of text have recently found some success as such a tool, but scalability remains an obstacle in their application, with standard approaches relying on sampling schemes that are known to be difficult to scale. In this report, we therefore present an empirical assessment of the sublinear time sparse stochastic variational inference (SSVI) scheme applied to RelLDA. We demonstrate that online inference leads to relatively strong qualitative results but also identify some of its pathologies---and those of the model---which will need to be overcome if SSVI is to be used for large-scale relation extraction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods