From LLM-anation to LLM-orchestrator: Coordinating Small Models for Data Labeling

Yao Lu; Zhaiyuan Ji; Jiawei Du; Yu Shanqing; Qi Xuan; Tianyi Zhou

arXiv:2506.16393·cs.CL·June 23, 2025

From LLM-anation to LLM-orchestrator: Coordinating Small Models for Data Labeling

Yao Lu, Zhaiyuan Ji, Jiawei Du, Yu Shanqing, Qi Xuan, Tianyi Zhou

PDF

Open Access

TL;DR

This paper introduces AutoAnnotator, a multi-model cooperative annotation framework that reduces costs and improves accuracy in data labeling by combining large and small language models with reinforcement learning.

Contribution

The paper proposes a novel multi-model cooperative annotation paradigm and an automatic framework that integrates LLMs and SLMs with reinforcement learning for efficient data annotation.

Findings

01

AutoAnnotator reduces annotation costs by 74.15%.

02

AutoAnnotator improves annotation accuracy by 6.21%.

03

Outperforms existing open-source/API LLMs in various settings.

Abstract

Although the annotation paradigm based on Large Language Models (LLMs) has made significant breakthroughs in recent years, its actual deployment still has two core bottlenecks: first, the cost of calling commercial APIs in large-scale annotation is very expensive; second, in scenarios that require fine-grained semantic understanding, such as sentiment classification and toxicity classification, the annotation accuracy of LLMs is even lower than that of Small Language Models (SLMs) dedicated to this field. To address these problems, we propose a new paradigm of multi-model cooperative annotation and design a fully automatic annotation framework AutoAnnotator based on this. Specifically, AutoAnnotator consists of two layers. The upper-level meta-controller layer uses the generation and reasoning capabilities of LLMs to select SLMs for annotation, automatically generate annotation code and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Artificial Intelligence in Healthcare and Education · Topic Modeling