DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition
Siun Kim, Hyung-Jin Yoon

TL;DR
DiZiNER leverages disagreement among multiple LLMs to simulate pilot annotation, refining instructions for improved zero-shot NER performance across numerous benchmarks.
Contribution
Introduces a disagreement-guided instruction refinement framework that uses LLMs as annotators and supervisors, achieving state-of-the-art zero-shot NER results.
Findings
Achieves zero-shot SOTA on 14 out of 18 benchmarks.
Reduces the gap between zero-shot and supervised NER by over 11 points.
Disagreement among models correlates strongly with NER performance.
Abstract
Large language models (LLMs) have advanced information extraction (IE) by enabling zero-shot and few-shot named entity recognition (NER), yet their generative outputs still show persistent and systematic errors. Despite progress through instruction fine-tuning, zero-shot NER still lags far behind supervised systems. These recurring errors mirror inconsistencies observed in early-stage human annotation processes that resolve disagreements through pilot annotation. Motivated by this analogy, we introduce DiZiNER (Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition), a framework that simulates the pilot annotation process, employing LLMs to act as both annotators and supervisors. Multiple heterogeneous LLMs annotate shared texts, and a supervisor model analyzes inter-model disagreements to refine task instructions. Across 18…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
