Open Information Extraction on Scientific Text: An Evaluation
Paul Groth, Michael Lauruhn, Antony Scerri, Ron Daniel Jr

TL;DR
This paper evaluates the performance of open information extraction systems on scientific texts across multiple disciplines, revealing significant performance gaps compared to general web text and providing insights for future improvements.
Contribution
It presents a comprehensive evaluation of OIE systems on scientific texts, highlighting their limitations and offering a new dataset for further research.
Findings
OIE systems perform worse on scientific texts than on general web text.
Error analysis identifies key areas for improving OIE accuracy.
A new corpus of scientific sentences and judgments is provided.
Abstract
Open Information Extraction (OIE) is the task of the unsupervised creation of structured information from text. OIE is often used as a starting point for a number of downstream tasks including knowledge base construction, relation extraction, and question answering. While OIE methods are targeted at being domain independent, they have been evaluated primarily on newspaper, encyclopedic or general web text. In this article, we evaluate the performance of OIE on scientific texts originating from 10 different disciplines. To do so, we use two state-of-the-art OIE systems applying a crowd-sourcing approach. We find that OIE systems perform significantly worse on scientific text than encyclopedic text. We also provide an error analysis and suggest areas of work to reduce errors. Our corpus of sentences and judgments are made available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
