Refining and Reusing Annotation Guidelines for LLM Annotation

Kon Woo Kim; Jin-Dong Kim; Akiko Aizawa

arXiv:2605.20809·cs.CL·May 21, 2026

Refining and Reusing Annotation Guidelines for LLM Annotation

Kon Woo Kim, Jin-Dong Kim, Akiko Aizawa

PDF

TL;DR

This paper presents an iterative moderation framework for refining annotation guidelines to improve LLM performance on specialized biomedical NER tasks, demonstrating empirical success across multiple models and datasets.

Contribution

It introduces a systematic reuse and refinement process for annotation guidelines as an alignment mechanism for LLMs, tested through an iterative moderation framework.

Findings

01

Guideline integration improves LLM annotation quality.

02

Reasoning-optimized models outperform standard models.

03

Moderation under minimal supervision is feasible.

Abstract

While Large Language Models (LLMs) demonstrate remarkable performance on zero-shot annotation tasks, they often struggle with the specialized conventions of gold-standard benchmarks. We propose the systematic reuse and refinement of annotation guidelines as an alignment mechanism, introducing an iterative moderation framework that simulates the early phases of annotation projects. We evaluate three hypotheses: (1) the efficacy of guideline integration, (2) the advantage of reasoning optimized models, and (3) the viability of moderation under minimal supervision. Testing across biomedical NER tasks (NCBI Disease, BC5CDR, BioRED) with three LLM families (GPT, Gemini, DeepSeek), our results empirically confirm all three hypotheses. While the iterative moderation framework shows good potential in effectively refining guidelines, our analysis also reveals substantial room for improvement.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.