From Dataset Recycling to Multi-Property Extraction and Beyond

Tomasz Dwojak; Micha{\l} Pietruszka; {\L}ukasz Borchmann; Jakub; Ch{\l}\k{e}dowski; Filip Grali\'nski

arXiv:2011.03228·cs.CL·November 9, 2020

From Dataset Recycling to Multi-Property Extraction and Beyond

Tomasz Dwojak, Micha{\l} Pietruszka, {\L}ukasz Borchmann, Jakub, Ch{\l}\k{e}dowski, Filip Grali\'nski

PDF

1 Repo

TL;DR

This paper advances information extraction by evaluating Transformer models on WikiReading, introducing a new dataset for multi-property extraction, and providing detailed analysis tools to improve model understanding.

Contribution

It introduces WikiReading Recycled, a new dataset that addresses previous limitations, and explores multi-property extraction with enhanced evaluation methods.

Findings

01

Dual-source Transformer model outperforms previous state-of-the-art

02

WikiReading Recycled dataset improves data quality and diversity

03

Diagnostic subsets enable detailed performance analysis

Abstract

This paper investigates various Transformer architectures on the WikiReading Information Extraction and Machine Reading Comprehension dataset. The proposed dual-source model outperforms the current state-of-the-art by a large margin. Next, we introduce WikiReading Recycled-a newly developed public dataset and the task of multiple property extraction. It uses the same data as WikiReading but does not inherit its predecessor's identified disadvantages. In addition, we provide a human-annotated test set with diagnostic subsets for a detailed analysis of model performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

applicaai/multi-property-extraction
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Adam · Attention Is All You Need · Multi-Head Attention · Byte Pair Encoding · Residual Connection · Dropout