CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large   Language Models for Data Annotation

Minzhi Li; Taiwei Shi; Caleb Ziems; Min-Yen Kan; Nancy F. Chen,; Zhengyuan Liu; Diyi Yang

arXiv:2310.15638·cs.CL·March 18, 2024·1 cites

CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation

Minzhi Li, Taiwei Shi, Caleb Ziems, Min-Yen Kan, Nancy F. Chen,, Zhengyuan Liu, Diyi Yang

PDF

Open Access 1 Repo

TL;DR

CoAnnotating introduces an uncertainty-guided framework for efficiently allocating annotation tasks between humans and large language models, improving annotation quality and reducing costs in NLP data labeling.

Contribution

This work presents a novel uncertainty-based approach for human-LLM collaboration in data annotation, optimizing work distribution for better performance and cost-effectiveness.

Findings

01

Up to 21% performance improvement over random allocation

02

Effective work allocation across multiple datasets

03

Demonstrates the potential of LLMs as complementary annotators

Abstract

Annotated data plays a critical role in Natural Language Processing (NLP) in training models and evaluating their performance. Given recent developments in Large Language Models (LLMs), models such as ChatGPT demonstrate zero-shot capability on many text-annotation tasks, comparable with or even exceeding human annotators. Such LLMs can serve as alternatives for manual annotation, due to lower costs and higher scalability. However, limited work has leveraged LLMs as complementary annotators, nor explored how annotation work is best allocated among humans and LLMs to achieve both quality and cost objectives. We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale. Under this framework, we utilize uncertainty to estimate LLMs' annotation capability. Our empirical study shows CoAnnotating to be an effective means to allocate work from results on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

salt-nlp/coannotating
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)