Re-TACRED: Addressing Shortcomings of the TACRED Dataset

George Stoica; Emmanouil Antonios Platanios; Barnab\'as P\'oczos

arXiv:2104.08398·cs.CL·April 20, 2021

Re-TACRED: Addressing Shortcomings of the TACRED Dataset

George Stoica, Emmanouil Antonios Platanios, Barnab\'as P\'oczos

PDF

1 Repo 1 Datasets 1 Video

TL;DR

This paper thoroughly re-annotates the TACRED dataset using improved crowdsourcing, revealing nearly 24% label errors, which significantly enhances model evaluation accuracy and provides a more reliable benchmark for relation extraction.

Contribution

It introduces Re-TACRED, a fully re-annotated version of TACRED, and demonstrates how correcting labels improves model performance and understanding.

Findings

01

23.9% of TACRED labels are incorrect

02

Model F1-score improves by 14.3% on Re-TACRED

03

Re-annotation uncovers significant model relationships

Abstract

TACRED is one of the largest and most widely used sentence-level relation extraction datasets. Proposed models that are evaluated using this dataset consistently set new state-of-the-art performance. However, they still exhibit large error rates despite leveraging external knowledge and unsupervised pretraining on large text corpora. A recent study suggested that this may be due to poor dataset quality. The study observed that over 50% of the most challenging sentences from the development and test sets are incorrectly labeled and account for an average drop of 8% f1-score in model performance. However, this study was limited to a small biased sample of 5k (out of a total of 106k) sentences, substantially restricting the generalizability and broader implications of its findings. In this paper, we address these shortcomings by: (i) performing a comprehensive study over the whole TACRED…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gstoica27/Re-TACRED
pytorchOfficial

Datasets

DFKI-SLT/tacred
dataset· 155 dl
155 dl

Videos

Re-TACRED: Addressing Shortcomings of the TACRED Dataset· underline