LLM-TA: An LLM-Enhanced Thematic Analysis Pipeline for Transcripts from Parents of Children with Congenital Heart Disease
Muhammad Zain Raza, Jiawei Xu, Terence Lim, Lily Boddy, Carlos M., Mery, Andrew Well, Ying Ding

TL;DR
This paper presents LLM-TA, a pipeline that leverages large language models to enhance the thematic analysis of healthcare transcripts, aiming to improve scalability, efficiency, and accuracy in analyzing complex qualitative data.
Contribution
The study introduces an LLM-Enhanced Thematic Analysis pipeline that integrates GPT-4o mini, LangChain, and prompt engineering to assist in inductive thematic analysis of healthcare transcripts.
Findings
The pipeline outperforms existing LLM-assisted TA methods.
It improves scalability, efficiency, and accuracy in thematic analysis.
Collaborative use with domain experts enhances real-world applicability.
Abstract
Thematic Analysis (TA) is a fundamental method in healthcare research for analyzing transcript data, but it is resource-intensive and difficult to scale for large, complex datasets. This study investigates the potential of large language models (LLMs) to augment the inductive TA process in high-stakes healthcare settings. Focusing on interview transcripts from parents of children with Anomalous Aortic Origin of a Coronary Artery (AAOCA), a rare congenital heart disease, we propose an LLM-Enhanced Thematic Analysis (LLM-TA) pipeline. Our pipeline integrates an affordable state-of-the-art LLM (GPT-4o mini), LangChain, and prompt engineering with chunking techniques to analyze nine detailed transcripts following the inductive TA framework. We evaluate the LLM-generated themes against human-generated results using thematic similarity metrics, LLM-assisted assessments, and expert reviews.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genomics and Phylogenetic Studies
