Open Information Extraction on Scientific Text: An Evaluation

Paul Groth; Michael Lauruhn; Antony Scerri; Ron Daniel Jr

arXiv:1802.05574·cs.CL·August 23, 2018·5 cites

Open Information Extraction on Scientific Text: An Evaluation

Paul Groth, Michael Lauruhn, Antony Scerri, Ron Daniel Jr

PDF

Open Access

TL;DR

This paper evaluates the performance of open information extraction systems on scientific texts across multiple disciplines, revealing significant performance gaps compared to general web text and providing insights for future improvements.

Contribution

It presents a comprehensive evaluation of OIE systems on scientific texts, highlighting their limitations and offering a new dataset for further research.

Findings

01

OIE systems perform worse on scientific texts than on general web text.

02

Error analysis identifies key areas for improving OIE accuracy.

03

A new corpus of scientific sentences and judgments is provided.

Abstract

Open Information Extraction (OIE) is the task of the unsupervised creation of structured information from text. OIE is often used as a starting point for a number of downstream tasks including knowledge base construction, relation extraction, and question answering. While OIE methods are targeted at being domain independent, they have been evaluated primarily on newspaper, encyclopedic or general web text. In this article, we evaluate the performance of OIE on scientific texts originating from 10 different disciplines. To do so, we use two state-of-the-art OIE systems applying a crowd-sourcing approach. We find that OIE systems perform significantly worse on scientific text than encyclopedic text. We also provide an error analysis and suggest areas of work to reduce errors. Our corpus of sentences and judgments are made available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research