AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators
Xingwei He, Zhenghao Lin, Yeyun Gong, A-Long Jin, Hang Zhang, Chen, Lin, Jian Jiao, Siu Ming Yiu, Nan Duan, Weizhu Chen

TL;DR
This paper introduces AnnoLLM, a system that uses large language models like GPT-3.5 to perform data annotation tasks effectively, reducing reliance on human annotators and achieving high-quality labeled datasets.
Contribution
The paper proposes a novel explain-then-annotate approach leveraging LLMs for data annotation, demonstrating its effectiveness across multiple NLP tasks and creating a new conversation-based retrieval dataset.
Findings
AnnoLLM outperforms or matches crowdsourced annotation quality.
The system reduces annotation costs and time.
A new high-quality conversational retrieval dataset was created.
Abstract
Many natural language processing (NLP) tasks rely on labeled data to train machine learning models with high performance. However, data annotation is time-consuming and expensive, especially when the task involves a large amount of data or requires specialized domains. Recently, GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks. In this paper, we first claim that large language models (LLMs), such as GPT-3.5, can serve as an excellent crowdsourced annotator when provided with sufficient guidance and demonstrated examples. Accordingly, we propose AnnoLLM, an annotation system powered by LLMs, which adopts a two-step approach, explain-then-annotate. Concretely, we first prompt LLMs to provide explanations for why the specific ground truth answer/label was assigned for a given example. Then, we construct the few-shot chain-of-thought…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Mobile Crowdsensing and Crowdsourcing · Artificial Intelligence in Healthcare and Education
Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Cosine Annealing · Linear Warmup With Cosine Annealing · Dense Connections · Layer Normalization
