# Between the lines: investigating health beliefs and emotional expressions in online mental health communities

**Authors:** Khanh Nguyen, Binh Vu, Swati Chandna, Jobst-Hendrik Schultz, Gwendolyn Mayer

PMC · DOI: 10.3389/fpsyg.2025.1521623 · Frontiers in Psychology · 2025-10-24

## TL;DR

This study explores how people express health beliefs and emotions in online mental health communities using the Health Belief Model and machine learning.

## Contribution

A novel methodology combining HBM classification and emotional analysis in online mental health discussions is developed and evaluated.

## Key findings

- DistilBERT achieved 75-84% accuracy in classifying most HBM components, but perceived severity prediction was challenging.
- Combining GPT-4 keyword extraction with human review improved perceived severity accuracy to 81%.
- Emotional analysis showed users use more negative language when discussing higher perceived severity.

## Abstract

Social media platforms play an important role in mental health discourse. Applying the Health Belief Model (HBM) to health-related discussions on Reddit could yield deeper insights into individuals' perceptions of mental health threats and barriers to seeking help. The primary objective of this research is to develop an efficient methodology not only for classifying key HBM components—such as perceived susceptibility, severity, benefits, barriers, cues to action, and self-efficacy—but also for examining emotional expressions within these discussions.

A sample of 5,000 posts was selected for classification and a subset was manually labelled for further analysis. Multiple models were tested in classification tasks. Data analysis utilized visualization techniques—such as word clouds, heatmaps, and emotional content analysis—to identify thematic trends and emotional expressions in the discussions.

DistilBERT outperformed other approaches, achieving accuracy rates between 75 and 84% for most components. However, challenges persist in predicting perceived severity, with an accuracy of only 47% due to its multi-label nature; to address this, GPT-4-based keyword extraction was combined with human review, improving accuracy to 81%. The emotional content analysis reveals patterns in mental health discussions, such as the attribution of personality as a root cause of anxiety by users and the urgent need for targeted interventions in cases of suicidal ideation.

Findings demonstrate that users tend to use more negative language in contexts with higher perceived severity. Future work should prioritize improving model adaptability to health-specific data, handling rare terms, conducting nuanced emotional analyses in written expressions, and addressing ethical implications in analyzing user-generated content.

## Full-text entities

- **Diseases:** suicidal ideation (MESH:D001072), anxiety (MESH:D001007)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12593464/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12593464/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12593464/full.md

---
Source: https://tomesphere.com/paper/PMC12593464