# Automated Quality Assessment for LLM-Based Complex Qualitative Coding: A Confidence-Diversity Framework

**Authors:** Zhilong Zhao, Yindi Liu

arXiv: 2508.20462 · 2025-10-01

## TL;DR

This paper introduces a dual-signal framework combining model confidence and inter-model consensus to reliably assess quality in AI-assisted qualitative coding across various domains, reducing manual effort and scaling analysis.

## Contribution

It develops and validates a domain-agnostic, scalable quality assessment framework that improves accuracy and efficiency in AI-assisted qualitative coding tasks.

## Key findings

- External entropy negatively correlates with accuracy
- Confidence positively correlates with accuracy in some domains
- Triage protocol reduces manual verification by 44.6%

## Abstract

Computational social science lacks a scalable and reliable mechanism to assure quality for AI-assisted qualitative coding when tasks demand domain expertise and long-text reasoning, and traditional double-coding is prohibitively costly at scale. We develop and validate a dual-signal quality assessment framework that combines model confidence with inter-model consensus (external entropy) and evaluate it across legal reasoning (390 Supreme Court cases), political analysis (645 hyperpartisan articles), and medical classification (1,000 clinical transcripts). External entropy is consistently negatively associated with accuracy (r = -0.179 to -0.273, p < 0.001), while confidence is positively associated in two domains (r = 0.104 to 0.429). Weight optimization improves over single-signal baselines by 6.6-113.7% and transfers across domains (100% success), and an intelligent triage protocol reduces manual verification effort by 44.6% while maintaining quality. The framework offers a principled, domain-agnostic quality assurance mechanism that scales qualitative coding without extensive double-coding, provides actionable guidance for sampling and verification, and enables larger and more diverse corpora to be analyzed with maintained rigor.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20462/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20462/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/2508.20462/full.md

---
Source: https://tomesphere.com/paper/2508.20462