A Confidence-Diversity Framework for Calibrating AI Judgement in Accessible Qualitative Coding Tasks
Zhilong Zhao, Yindi Liu

TL;DR
This paper introduces a confidence-diversity calibration framework for evaluating AI judgment in qualitative coding, effectively reducing manual effort and enhancing reliability by leveraging model confidence and diversity signals.
Contribution
It proposes a novel calibration method combining confidence and diversity metrics, demonstrating significant efficiency gains and transferability across domains for LLM-based qualitative coding.
Findings
Self-confidence correlates strongly with inter-model agreement (r=0.82)
Diversity measured by Shannon entropy explains agreement (R-squared=0.979)
Framework reduces manual effort by 65% with high accuracy
Abstract
LLMs enable qualitative coding at large scale, but assessing reliability remains challenging where human experts seldom agree. We investigate confidence-diversity calibration as a quality assessment framework for accessible coding tasks where LLMs already demonstrate strong performance but exhibit overconfidence. Analysing 5,680 coding decisions from eight state-of-the-art LLMs across ten categories, we find that mean self-confidence tracks inter-model agreement closely (Pearson r=0.82). Adding model diversity quantified as normalised Shannon entropy produces a dual signal explaining agreement almost completely (R-squared=0.979), though this high predictive power likely reflects task simplicity for current LLMs. The framework enables a three-tier workflow auto-accepting 35 percent of segments with less than 5 percent error, cutting manual effort by 65 percent. Cross-domain validation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Intelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics
