# From digital traces to public vaccination behaviors: leveraging large language models for big data classification

**Authors:** Yoo Jung Oh, Muhammad Ehab Rasul, Emily McKinley, Christopher Calabrese

PMC · DOI: 10.3389/frai.2025.1602984 · Frontiers in Artificial Intelligence · 2025-07-23

## TL;DR

This paper shows how large language models can analyze social media posts to understand vaccination behaviors and align them with real-world data.

## Contribution

The study introduces a fine-tuned GPT-4o-mini model that outperforms other LLMs in classifying vaccine-related social media content.

## Key findings

- Fine-tuned GPT-4o-mini achieved higher accuracy, precision, recall, and F1 score compared to other models.
- About 9.84% of social media posts reflected personal vaccination behavior, while 71.45% involved information sharing.
- There was a strong correlation (r = 0.76) between social media vaccination behaviors and actual vaccine uptake.

## Abstract

The current study leverages large language models (LLMs) to capture health behaviors expressed in social media posts, focusing on COVID-19 vaccine-related content from 2020 to 2021.

To examine the capabilities of prompt engineering and fine-tuning approaches with LLMs, this study examines the performance of three state-of-the-art LLMs: GPT-4o, GPT-4o-mini, and GPT-4o-mini with fine-tuning, focusing on their ability to classify individuals’ vaccination behavior, intention to vaccinate, and information sharing. We then cross-validate these classifications with nationwide vaccination statistics to assess alignment with observed trends.

GPT-4o-mini with fine-tuning outperformed both GPT-4o and the standard GPT-4o-mini in terms of accuracy, precision, recall, and F1 score. Using GPT-4o-mini with fine-tuning for classification, about 9.84% of the posts (N = 36,912) included personal behavior related to getting the COVID-19 vaccine while a majority of posts (71.45%; N = 267,930) included information sharing about the virus. Lastly, we found a strong correlation (r = 0.76, p < 0.01) between vaccination behaviors expressed on social media and the actual vaccine uptake over time.

This study suggests that LLMs can serve as powerful tools for estimating real-world behaviors. Methodological and practical implications of utilizing LLMs in human behavior research are further discussed.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** COVID-19 (MESH:D000086382)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12325327/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12325327/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12325327/full.md

---
Source: https://tomesphere.com/paper/PMC12325327