Analyzing Cancer Patients' Experiences with Embedding-based Topic Modeling and LLMs
Teodor-C\u{a}lin Ionescu, Lifeng Han, Jan Heijdra Suasnabar, Anne Stiggelbout, Suzan Verberne

TL;DR
This paper evaluates neural topic modeling and LLMs for extracting meaningful themes from cancer patient stories, demonstrating that domain-specific embeddings improve interpretability and relevance of identified topics.
Contribution
It introduces a comparative analysis of BERTopic and Top2Vec for patient interview summarization and demonstrates the effectiveness of clinical embeddings like BioClinicalBERT in healthcare-related topic modeling.
Findings
BERTopic outperforms Top2Vec in keyword extraction
BioClinicalBERT embeddings enhance topic interpretability
Identified dominant themes include care coordination and patient decision-making
Abstract
This study investigates the use of neural topic modeling and LLMs to uncover meaningful themes from patient storytelling data, to offer insights that could contribute to more patient-oriented healthcare practices. We analyze a collection of transcribed interviews with cancer patients (132,722 words in 13 interviews). We first evaluate BERTopic and Top2Vec for individual interview summarization by using similar preprocessing, chunking, and clustering configurations to ensure a fair comparison on Keyword Extraction. LLMs (GPT4) are then used for the next step topic labeling. Their outputs for a single interview (I0) are rated through a small-scale human evaluation, focusing on {coherence}, {clarity}, and {relevance}. Based on the preliminary results and evaluation, BERTopic shows stronger performance and is selected for further experimentation using three {clinically oriented embedding}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mental Health via Writing · Health Literacy and Information Accessibility
