Pre-trained Language Models as Re-Annotators
Chang Shu

TL;DR
This paper introduces a method using pre-trained language models to automatically detect and correct annotation errors in datasets, significantly improving data quality and downstream relation extraction performance.
Contribution
It proposes a novel approach leveraging semantic annotation representations, credibility scoring, and contrastive learning for automated annotation noise reduction.
Findings
Credibility scores align well with human revisions, achieving high binary F1 scores.
Neighbour-aware classifiers improve annotation correction accuracy.
Automatically denoised datasets lead to up to 3.6% performance gains in relation extraction.
Abstract
Annotation noise is widespread in datasets, but manually revising a flawed corpus is time-consuming and error-prone. Hence, given the prior knowledge in Pre-trained Language Models and the expected uniformity across all annotations, we attempt to reduce annotation noise in the corpus through two tasks automatically: (1) Annotation Inconsistency Detection that indicates the credibility of annotations, and (2) Annotation Error Correction that rectifies the abnormal annotations. We investigate how to acquire semantic sensitive annotation representations from Pre-trained Language Models, expecting to embed the examples with identical annotations to the mutually adjacent positions even without fine-tuning. We proposed a novel credibility score to reveal the likelihood of annotation inconsistencies based on the neighbouring consistency. Then, we fine-tune the Pre-trained Language Models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsContrastive Learning
