Pre-trained Language Models as Re-Annotators

Chang Shu

arXiv:2205.05368·cs.CL·May 12, 2022

Pre-trained Language Models as Re-Annotators

Chang Shu

PDF

Open Access

TL;DR

This paper introduces a method using pre-trained language models to automatically detect and correct annotation errors in datasets, significantly improving data quality and downstream relation extraction performance.

Contribution

It proposes a novel approach leveraging semantic annotation representations, credibility scoring, and contrastive learning for automated annotation noise reduction.

Findings

01

Credibility scores align well with human revisions, achieving high binary F1 scores.

02

Neighbour-aware classifiers improve annotation correction accuracy.

03

Automatically denoised datasets lead to up to 3.6% performance gains in relation extraction.

Abstract

Annotation noise is widespread in datasets, but manually revising a flawed corpus is time-consuming and error-prone. Hence, given the prior knowledge in Pre-trained Language Models and the expected uniformity across all annotations, we attempt to reduce annotation noise in the corpus through two tasks automatically: (1) Annotation Inconsistency Detection that indicates the credibility of annotations, and (2) Annotation Error Correction that rectifies the abnormal annotations. We investigate how to acquire semantic sensitive annotation representations from Pre-trained Language Models, expecting to embed the examples with identical annotations to the mutually adjacent positions even without fine-tuning. We proposed a novel credibility score to reveal the likelihood of annotation inconsistencies based on the neighbouring consistency. Then, we fine-tune the Pre-trained Language Models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsContrastive Learning