A Confidence-Diversity Framework for Calibrating AI Judgement in Accessible Qualitative Coding Tasks

Zhilong Zhao; Yindi Liu

arXiv:2508.02029·cs.LG·August 19, 2025

A Confidence-Diversity Framework for Calibrating AI Judgement in Accessible Qualitative Coding Tasks

Zhilong Zhao, Yindi Liu

PDF

Open Access

TL;DR

This paper introduces a confidence-diversity calibration framework for evaluating AI judgment in qualitative coding, effectively reducing manual effort and enhancing reliability by leveraging model confidence and diversity signals.

Contribution

It proposes a novel calibration method combining confidence and diversity metrics, demonstrating significant efficiency gains and transferability across domains for LLM-based qualitative coding.

Findings

01

Self-confidence correlates strongly with inter-model agreement (r=0.82)

02

Diversity measured by Shannon entropy explains agreement (R-squared=0.979)

03

Framework reduces manual effort by 65% with high accuracy

Abstract

LLMs enable qualitative coding at large scale, but assessing reliability remains challenging where human experts seldom agree. We investigate confidence-diversity calibration as a quality assessment framework for accessible coding tasks where LLMs already demonstrate strong performance but exhibit overconfidence. Analysing 5,680 coding decisions from eight state-of-the-art LLMs across ten categories, we find that mean self-confidence tracks inter-model agreement closely (Pearson r=0.82). Adding model diversity quantified as normalised Shannon entropy produces a dual signal explaining agreement almost completely (R-squared=0.979), though this high predictive power likely reflects task simplicity for current LLMs. The framework enables a three-tier workflow auto-accepting 35 percent of segments with less than 5 percent error, cutting manual effort by 65 percent. Cross-domain validation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Intelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics