LLM Reasoning Predicts When Models Are Right: Evidence from Coding Classroom Discourse

Bakhtawar Ahtisham; Kirk Vanacore; Zhuqian Zhou; Jinsook Lee; Rene F. Kizilcec

arXiv:2602.09832·cs.CL·February 11, 2026

LLM Reasoning Predicts When Models Are Right: Evidence from Coding Classroom Discourse

Bakhtawar Ahtisham, Kirk Vanacore, Zhuqian Zhou, Jinsook Lee, Rene F. Kizilcec

PDF

Open Access

TL;DR

This study demonstrates that reasoning generated by Large Language Models can effectively predict the correctness of their own predictions in analyzing classroom dialogue, improving automated educational assessment.

Contribution

It introduces a reasoning-based approach using linguistic cues and supervised classifiers to detect errors in LLM predictions within educational dialogue analysis.

Findings

01

Random Forest classifier achieved an F1 score of 0.83 in error detection

02

Construct-specific linguistic cues improve detection performance

03

Correct predictions show grounded causal language, while incorrect ones rely on hedging and metacognition

Abstract

Large Language Models (LLMs) are increasingly deployed to automatically label and analyze educational dialogue at scale, yet current pipelines lack reliable ways to detect when models are wrong. We investigate whether reasoning generated by LLMs can be used to predict the correctness of a model's own predictions. We analyze 30,300 teacher utterances from classroom dialogue, each labeled by multiple state-of-the-art LLMs with an instructional move construct and an accompanying reasoning. Using human-verified ground-truth labels, we frame the task as predicting whether a model's assigned label for a given utterance is correct. We encode LLM reasoning using Term Frequency-Inverse Document Frequency (TF-IDF) and evaluate five supervised classifiers. A Random Forest classifier achieves an F1 score of 0.83 (Recall = 0.854), successfully identifying most incorrect predictions and outperforming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Text Readability and Simplification