Automated Thematic Analysis for Clinical Qualitative Data: Iterative Codebook Refinement with Full Provenance
Seungjun Yi, Joakim Nguyen, Huimin Xu, Terence Lim, Joseph Skrovan, Mehak Beri, Hitakshi Modi, Andrew Well, Carlos M. Mery, Yan Zhang, Mia K. Markey, Ying Ding

TL;DR
This paper introduces an automated thematic analysis framework for clinical qualitative data that enhances codebook quality and reproducibility through iterative refinement and provenance tracking, outperforming existing methods across diverse datasets.
Contribution
The paper presents a novel automated TA approach that combines iterative codebook refinement with full provenance, improving generalizability and auditability over prior methods.
Findings
Achieves highest quality scores on four of five datasets
Significant improvements in code reusability and consistency
Generated themes align with expert annotations in clinical data
Abstract
Thematic analysis (TA) is widely used in health research to extract patterns from patient interviews, yet manual TA faces challenges in scalability and reproducibility. LLM-based automation can help, but existing approaches produce codebooks with limited generalizability and lack analytic auditability. We present an automated TA framework combining iterative codebook refinement with full provenance tracking. Evaluated on five corpora spanning clinical interviews, social media, and public transcripts, the framework achieves the highest composite quality score on four of five datasets compared to six baselines. Iterative refinement yields statistically significant improvements on four datasets with large effect sizes, driven by gains in code reusability and distributional consistency while preserving descriptive quality. On two clinical corpora (pediatric cardiology), generated themes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Biomedical Text Mining and Ontologies
