# Analyzing Sleep Behavior Using BERT-BiLSTM and Fine-Tuned GPT-2 Sentiment Classification: Comparison Study

**Authors:** Yihan Deng, Julia van der Meer, Athina Tzovara, Markus Schmidt, Claudio Bassetti, Kerstin Denecke

PMC · DOI: 10.2196/70753 · JMIR Medical Informatics · 2025-11-10

## TL;DR

This paper compares subjective and objective sleep assessments using AI models to analyze clinical narratives and finds that patient-reported experiences may differ from clinical measures.

## Contribution

A novel aspect-based sentiment analysis method using BERT-BiLSTM and fine-tuned GPT-2 to analyze sleep-related clinical narratives.

## Key findings

- 15% of patients showed discrepancies between subjective and objective sleepiness measures.
- Sentiment analysis of clinical narratives revealed statistically significant sleepiness perception differences.
- Standardized sleep tests may not fully capture patient-reported experiences.

## Abstract

The diagnosis of sleep disorders presents a challenging landscape, characterized by the complex nature of their assessment and the often divergent views between objective clinical assessment and subjective patient experience. This study explores the interplay between these perspectives, focusing on the variability of individual perceptions of sleep quality and latency.

Our primary goal was to investigate the alignment, or lack thereof, between subjective experiences and objective measures in the assessment of sleep disorders.

To study this, we developed an aspect-based sentiment analysis method for clinical narratives: using large language models (Falcon 40B and Mixtral 8X7B), we are identifying entity groups of 3 aspects related to sleep behavior (day sleepiness, sleep quality, and fatigue). To phrases referring to these aspects, we are assigning sentiment values between 0 and 1 using a BERT-BiLSTM–based approach (accuracy 78%) and a fine-tuned GPT-2 sentiment classifier (accuracy 87%).

In a cohort of 100 patients with complete subjective (Karolinska Sleepiness Scale [KSS]) and objective (Multiple Sleep Latency Test [MSLT]) assessments, approximately 15% exhibited notable discrepancies between perceived and measured levels of daytime sleepiness. A paired-sample t test comparing KSS scores to MSLT latencies approached statistical significance (t99=2.456; P=.06), suggesting a potential misalignment between subjective reports and physiological markers. In contrast, the comparison using text-derived sentiment scores revealed a statistically significant divergence (t99=2.324; P=.047), indicating that clinical narratives may more reliably capture discrepancies in sleepiness perception. These results underscore the importance of integrating multiple subjective sources, with an emphasis on narrative free text, in the assessment of domains such as fatigue and daytime sleepiness—where standardized measures may not fully reflect the patient’s lived experience.

Our method has potential in uncovering critical insights into patient self-perception versus clinical evaluations, which enables clinicians to identify patients requiring objective verification of self-reported symptoms.

## Full-text entities

- **Diseases:** fatigue (MESH:D005221), day sleepiness (MESH:D014786), Sleepiness (MESH:D000077260), daytime sleepiness (MESH:D012893)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12599995/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12599995/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/PMC12599995/full.md

---
Source: https://tomesphere.com/paper/PMC12599995