Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation
Nilesh Jain, Hyungil Suh, Seyi Adeyinka, Leor Roseman, Aza Allsop

TL;DR
This paper introduces a validation framework combining Cohen's Kappa and semantic similarity to assess the reliability of LLM-based thematic analysis in qualitative research, demonstrated on art therapy transcripts.
Contribution
It presents a novel multi-metric validation approach with configurable parameters and consensus extraction, enhancing reliability in AI-assisted qualitative analysis.
Findings
Gemini 2.5 Pro achieved highest reliability metrics
All models showed high inter-rater agreement ($ppa > 0.80$)
Framework successfully extracts consensus themes across multiple runs.
Abstract
Qualitative research faces a critical reliability challenge: traditional inter-rater agreement methods require multiple human coders, are time-intensive, and often yield moderate consistency. We present a multi-perspective validation framework for LLM-based thematic analysis that combines ensemble validation with dual reliability metrics: Cohen's Kappa () for inter-rater agreement and cosine similarity for semantic consistency. Our framework enables configurable analysis parameters (1-6 seeds, temperature 0.0-2.0), supports custom prompt structures with variable substitution, and provides consensus theme extraction across any JSON format. As proof-of-concept, we evaluate three leading LLMs (Gemini 2.5 Pro, GPT-4o, Claude 3.5 Sonnet) on a psychedelic art therapy interview transcript, conducting six independent runs per model. Results demonstrate Gemini achieves highest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReliability and Agreement in Measurement · Computational and Text Analysis Methods · Meta-analysis and systematic reviews
