QuaLLM-Health: An Adaptation of an LLM-Based Framework for Quantitative Data Extraction from Online Health Discussions
Ramez Kouzy, Roxanna Attar-Olyaee, Michael K. Rooney, Comron J., Hassanzadeh, Junyi Jessy Li, Osama Mohamad

TL;DR
QuaLLM-Health adapts large language models to efficiently extract clinically relevant quantitative data from Reddit health discussions, enabling large-scale analysis with high accuracy and low cost.
Contribution
This work introduces an adapted LLM-based framework, QuaLLM-Health, for extracting structured health data from social media discussions, demonstrating high accuracy and efficiency.
Findings
Achieved over 0.85 accuracy for all variables
F1 scores exceeded 0.90 for key variables
Extraction process cost under $3 and took about one hour
Abstract
Health-related discussions on social media like Reddit offer valuable insights, but extracting quantitative data from unstructured text is challenging. In this work, we present an adapted framework from QuaLLM into QuaLLM-Health for extracting clinically relevant quantitative data from Reddit discussions about glucagon-like peptide-1 (GLP-1) receptor agonists using large language models (LLMs). We collected 410k posts and comments from five GLP-1-related communities using the Reddit API in July 2024. After filtering for cancer-related discussions, 2,059 unique entries remained. We developed annotation guidelines to manually extract variables such as cancer survivorship, family cancer history, cancer types mentioned, risk perceptions, and discussions with physicians. Two domain-experts independently annotated a random sample of 100 entries to create a gold-standard dataset. We then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Biomedical Text Mining and Ontologies
