IELM: An Open Information Extraction Benchmark for Pre-Trained Language   Models

Chenguang Wang; Xiao Liu; Dawn Song

arXiv:2210.14128·cs.CL·October 26, 2022

IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models

Chenguang Wang, Xiao Liu, Dawn Song

PDF

Open Access

TL;DR

This paper introduces IELM, a new benchmark for open information extraction using pre-trained language models, demonstrating that these models can perform zero-shot OIE with competitive results on multiple datasets.

Contribution

The paper creates a novel open information extraction benchmark for pre-trained LMs and shows they can perform zero-shot OIE effectively without training.

Findings

01

Pre-trained LMs achieve competitive zero-shot OIE performance.

02

They outperform supervised methods on factual OIE datasets.

03

The benchmark includes new large-scale datasets for evaluation.

Abstract

We introduce a new open information extraction (OIE) benchmark for pre-trained language models (LM). Recent studies have demonstrated that pre-trained LMs, such as BERT and GPT, may store linguistic and relational knowledge. In particular, LMs are able to answer ``fill-in-the-blank'' questions when given a pre-defined relation category. Instead of focusing on pre-defined relations, we create an OIE benchmark aiming to fully examine the open relational information present in the pre-trained LMs. We accomplish this by turning pre-trained LMs into zero-shot OIE systems. Surprisingly, pre-trained LMs are able to obtain competitive performance on both standard OIE datasets (CaRB and Re-OIE2016) and two new large-scale factual OIE datasets (TAC KBP-OIE and Wikidata-OIE) that we establish via distant supervision. For instance, the zero-shot pre-trained LMs outperform the F1 score of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Cosine Annealing · Byte Pair Encoding · Residual Connection · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections