TL;DR
This paper introduces an LLM-based method to construct social media datasets for measuring and analyzing loneliness in caregivers and non-caregivers, revealing distinct loneliness causes and demographic insights.
Contribution
It develops an expert-informed framework and pipeline leveraging GPT models to create high-quality datasets and analyze loneliness across populations.
Findings
The evaluation framework achieved over 76% accuracy in identifying loneliness.
Cause categorization reached F1 scores above 0.80 for both groups.
Caregivers' loneliness was mainly linked to caregiving roles and identity issues.
Abstract
This paper presents an LLM-driven approach for constructing diverse social media datasets to measure and compare loneliness in the caregiver and non-caregiver populations. We introduce an expert-developed loneliness evaluation framework and an expert-informed typology for categorizing causes of loneliness for analyzing social media text. Using a human-validated data processing pipeline, we apply GPT-4o, GPT-5-nano, and GPT-5 to build a high-quality Reddit corpus and analyze loneliness across both populations. The loneliness evaluation framework achieved average accuracies of 76.09% and 79.78% for caregivers and non-caregivers, respectively. The cause categorization framework achieved micro-aggregate F1 scores of 0.825 and 0.80 for caregivers and non-caregivers, respectively. Across populations, we observe substantial differences in the distribution of types of causes of loneliness.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
