Optimizing LLM Annotation of Classroom Discourse through Multi-Agent Orchestration
Bakhtawar Ahtisham, Kirk Vanacore, Rene F. Kizilcec

TL;DR
This paper introduces a hierarchical, multi-stage framework for improving the reliability of LLM-based classroom discourse annotation by mimicking human annotation workflows and explicitly modeling computational tradeoffs.
Contribution
It presents a novel multi-stage orchestration framework that enhances annotation reliability and validity for educational data using cost-aware LLM workflows.
Findings
Hierarchical framework improves annotation accuracy over single-pass models
Self-verification reduces errors in initial LLM annotations
Disagreement adjudication aligns model outputs with human expert standards
Abstract
Large language models (LLMs) are increasingly positioned as scalable tools for annotating educational data, including classroom discourse, interaction logs, and qualitative learning artifacts. Their ability to rapidly summarize instructional interactions and assign rubric-aligned labels has fueled optimism about reducing the cost and time associated with expert human annotation. However, growing evidence suggests that single-pass LLM outputs remain unreliable for high-stakes educational constructs that require contextual, pedagogical, or normative judgment, such as instructional intent or discourse moves. This tension between scale and validity sits at the core of contemporary education data science. In this work, we present and empirically evaluate a hierarchical, cost-aware orchestration framework for LLM-based annotation that improves reliability while explicitly modeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics · Intelligent Tutoring Systems and Adaptive Learning · Computational and Text Analysis Methods
