Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning

Seungjun Yi; Joakim Nguyen; Huimin Xu; Terence Lim; Andrew Well; Mia Markey; Ying Ding

arXiv:2506.23998·cs.CL·August 12, 2025

Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning

Seungjun Yi, Joakim Nguyen, Huimin Xu, Terence Lim, Andrew Well, Mia Markey, Ying Ding

PDF

Open Access 1 Video

TL;DR

This paper introduces Auto-TA, an automated multi-agent LLM pipeline with reinforcement learning, enabling scalable thematic analysis of clinical narratives in healthcare, reducing manual effort and improving relevance.

Contribution

It presents a novel multi-agent LLM framework with RLHF for automated thematic analysis, tailored for clinical narratives, enhancing scalability and thematic quality.

Findings

01

Automated TA reduces manual coding effort.

02

Multi-agent framework improves theme quality.

03

Reinforcement learning aligns themes with human analysis.

Abstract

Congenital heart disease (CHD) presents complex, lifelong challenges often underrepresented in traditional clinical metrics. While unstructured narratives offer rich insights into patient and caregiver experiences, manual thematic analysis (TA) remains labor-intensive and unscalable. We propose a fully automated large language model (LLM) pipeline that performs end-to-end TA on clinical narratives, which eliminates the need for manual coding or full transcript review. Our system employs a novel multi-agent framework, where specialized LLM agents assume roles to enhance theme quality and alignment with human analysis. To further improve thematic relevance, we optionally integrate reinforcement learning from human feedback (RLHF). This supports scalable, patient-centered analysis of large qualitative datasets and allows LLMs to be fine-tuned for specific clinical contexts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning· underline

Taxonomy

TopicsMachine Learning in Healthcare · Genomics and Rare Diseases · Explainable Artificial Intelligence (XAI)