# A suite of large language models for public health infoveillance

**Authors:** Xinyu Zhou, Jiaqi Zhou, Chiyu Wang, Qianqian Xie, Kaize Ding, Chengsheng Mao, Yuntian Liu, Zhiyuan Cao, Huangrui Chu, Xi Chen, Hua Xu, Heidi J. Larson, Yuan Luo

PMC · DOI: 10.1038/s41746-026-02435-6 · NPJ Digital Medicine · 2026-02-23

## TL;DR

This paper introduces PH-LLM, a new set of large language models for real-time public health monitoring using social media data.

## Contribution

The novel PH-LLM suite offers improved multilingual public health infoveillance capabilities compared to existing models.

## Key findings

- PH-LLM outperformed baseline models in both English and multilingual tasks.
- PH-LLM-14B and PH-LLM-32B achieved higher performance than larger models like Qwen2.5-72B-Instruct and GPT-4o.
- The model provides cost-effective solutions for real-time public health monitoring.

## Abstract

Social media is a critical platform for understanding and fostering public engagement with health interventions. However, the lack of real-time social media infoveillance on public health issues may lead to delayed responses and suboptimal policy adjustments. To address this gap, we developed PH-LLM—a novel suite of large language models (LLMs) designed for real-time public health monitoring. We curated a multilingual training corpus and trained PH-LLM using QLoRA and LoRA plus, leveraging Qwen 2.5. We constructed a benchmark comprising 19 English and 20 multilingual held-out tasks and evaluated PH-LLM’s zero-shot performance. PH-LLM consistently outperformed baseline LLMs of similar and larger sizes. PH-LLM-14B and PH-LLM-32B surpassed Qwen2.5-72B-Instruct, Llama-3.1-70B-Instruct, Mistral-Large-Instruct-2407, and GPT-4o in both English tasks (>=56.0% vs. <= 52.3%) and multilingual tasks (>=59.6% vs. <= 59.1%). PH-LLM represents a significant advancement in real-time public health infoveillance, offering state-of-the-art multilingual capabilities and cost-effective solutions for monitoring public sentiment on health issues.

## Full-text entities

- **Diseases:** PHQA (MESH:C538270), infected (MESH:D007239), Med-PaLM (MESH:C535620), PMC-LLaMA (MESH:D020967), COVID-19 (MESH:D000086382), LLMs (MESH:D007806), infectious diseases (MESH:D003141)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13039744/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13039744/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/PMC13039744/full.md

---
Source: https://tomesphere.com/paper/PMC13039744