Beyond the Numbers: Transparency in Relation Extraction Benchmark   Creation and Leaderboards

Varvara Arzt; Allan Hanbury

arXiv:2411.05224·cs.CL·November 11, 2024

Beyond the Numbers: Transparency in Relation Extraction Benchmark Creation and Leaderboards

Varvara Arzt, Allan Hanbury

PDF

Open Access

TL;DR

This paper critically examines the transparency issues in relation extraction benchmarks and leaderboards, highlighting their limitations and advocating for better documentation and evaluation practices to genuinely measure progress.

Contribution

It identifies key transparency shortcomings in RE benchmarks and leaderboards, proposing improvements for documentation and evaluation to better assess model performance.

Findings

01

RE benchmarks like TACRED and NYT are highly imbalanced and noisy

02

Current leaderboards rely mainly on aggregate metrics like F1-score

03

Class-based performance metrics are often missing, obscuring true model capabilities

Abstract

This paper investigates the transparency in the creation of benchmarks and the use of leaderboards for measuring progress in NLP, with a focus on the relation extraction (RE) task. Existing RE benchmarks often suffer from insufficient documentation, lacking crucial details such as data sources, inter-annotator agreement, the algorithms used for the selection of instances for datasets, and information on potential biases like dataset imbalance. Progress in RE is frequently measured by leaderboards that rank systems based on evaluation methods, typically limited to aggregate metrics like F1-score. However, the absence of detailed performance analysis beyond these metrics can obscure the true generalisation capabilities of models. Our analysis reveals that widely used RE benchmarks, such as TACRED and NYT, tend to be highly imbalanced and contain noisy labels. Moreover, the lack of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies

MethodsFocus