H-COAL: Human Correction of AI-Generated Labels for Biomedical Named   Entity Recognition

Xiaojing Duan; John P. Lalor

arXiv:2311.11981·cs.CL·November 21, 2023·1 cites

H-COAL: Human Correction of AI-Generated Labels for Biomedical Named Entity Recognition

Xiaojing Duan, John P. Lalor

PDF

Open Access

TL;DR

H-COAL is a framework that leverages human correction of AI-generated labels to efficiently improve biomedical named entity recognition, significantly reducing human effort while approaching expert-level performance.

Contribution

This work introduces a novel framework for selectively correcting AI-generated labels, demonstrating substantial performance gains with minimal human effort.

Findings

01

Correcting 5% of labels closes 64% of the performance gap.

02

Correcting 20% of labels closes 86% of the performance gap.

03

Selective correction approaches near-human annotation quality efficiently.

Abstract

With the rapid advancement of machine learning models for NLP tasks, collecting high-fidelity labels from AI models is a realistic possibility. Firms now make AI available to customers via predictions as a service (PaaS). This includes PaaS products for healthcare. It is unclear whether these labels can be used for training a local model without expensive annotation checking by in-house experts. In this work, we propose a new framework for Human Correction of AI-Generated Labels (H-COAL). By ranking AI-generated outputs, one can selectively correct labels and approach gold standard performance (100% human labeling) with significantly less human effort. We show that correcting 5% of labels can close the AI-human performance gap by up to 64% relative improvement, and correcting 20% of labels can close the performance gap by up to 86% relative improvement.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques

Methodstravel james