Decoding Student Dialogue: A Multi-Dimensional Comparison and Bias Analysis of Large Language Models as Annotation Tools

Jie Cao; Zhanxin Hao; Jifan Yu

arXiv:2604.04370·cs.HC·April 7, 2026

Decoding Student Dialogue: A Multi-Dimensional Comparison and Bias Analysis of Large Language Models as Annotation Tools

Jie Cao, Zhanxin Hao, Jifan Yu

PDF

TL;DR

This study assesses GPT-5.2 and Gemini-3 for educational dialogue annotation, revealing context-dependent accuracy, bias patterns, and the importance of deployment considerations.

Contribution

It provides a comprehensive evaluation of large language models as annotation tools, highlighting their biases and performance variations across educational contexts.

Findings

01

Multi-agent prompting achieved highest accuracy but not statistically significant.

02

Higher accuracy in K-12 datasets compared to university-level data.

03

Bias patterns include optimistic bias in Gemini-3 and domain-specific under/overestimation.

Abstract

Educational dialogue is critical for decoding student learning processes, yet manual annotation remains time-consuming. This study evaluates the efficacy of GPT-5.2 and Gemini-3 using three prompting strategies (few-shot, single-agent, and multi-agent reflection) across diverse subjects, educational levels, and four coding dimensions. Results indicate that while multi-agent prompting achieved the highest accuracy, the results did not reach statistical significance. Accuracy proved highly context-dependent, with significantly higher performance in K-12 datasets compared to university-level data, alongside disciplinary variations within the same educational level. Performance peaked in the affective dimension but remained lowest in the cognitive dimension. Furthermore, analysis revealed four bias patterns: (1) Gemini-3 exhibited a consistent optimistic bias in the affective dimension…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.