From Tool to Teammate: LLM Coding Agents as Collaborative Partners for Behavioral Labeling in Educational Dialogue Analysis

Eason Chen; Isabel Wang; Nina Yuan; Sophia Judicke; Kayla Beigh; Xinyi Tang

arXiv:2603.27440·cs.HC·March 31, 2026

From Tool to Teammate: LLM Coding Agents as Collaborative Partners for Behavioral Labeling in Educational Dialogue Analysis

Eason Chen, Isabel Wang, Nina Yuan, Sophia Judicke, Kayla Beigh, Xinyi Tang

PDF

TL;DR

This paper introduces an autonomous LLM-based coding agent that iteratively improves dialogue labeling prompts, achieving human-level reliability in educational dialogue analysis at low cost.

Contribution

It presents a novel iterative prompt refinement methodology using LLM agents for behavioral coding, demonstrating improved accuracy and insights in educational dialogue analysis.

Findings

01

Achieved a Cohen's kappa of 0.78, matching human inter-rater reliability.

02

Demonstrated the approach on 659 tutoring sessions across multiple experiments.

03

Identified a new labeling pattern regarding expressions of confusion.

Abstract

Behavioral analysis of tutoring dialogues is essential for understanding student learning, yet manual coding remains a bottleneck. We present a methodology where LLM coding agents autonomously improve the prompts used by LLM classifiers to label educational dialogues. In each iteration, a coding agent runs the classifier against human-labeled validation data, analyzes disagreements, and proposes theory-grounded prompt modifications for researcher review. Applying this approach to 659 AI tutoring sessions across four experiments with three agents and three classifiers, 4-fold cross-validation on held-out data confirmed genuine improvement: the best agent achieved test $κ = 0.78$ (SD $= 0.08$ ), matching human inter-rater reliability ( $κ = 0.78$ ), at a cost of approximately $5--8 per agent. While development-set performance reached $κ = 0.91$ -- $0.93$ , the cross-validated results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.