Diverse Perspectives, Divergent Models: Cross-Cultural Evaluation of Depression Detection on Twitter
Nuredin Ali, Charles Chuankai Zhang, Ned Mayo, Stevie Chancellor

TL;DR
This study evaluates how well depression detection models trained on Twitter data generalize across different cultures, revealing significant performance gaps especially for Global South users and emphasizing the need for culturally diverse datasets.
Contribution
It introduces a cross-cultural Twitter dataset for depression detection and systematically assesses the models' generalization across diverse cultural groups.
Findings
Models perform worse on Global South users
Pre-trained language models outperform logistic regression
Significant performance gaps remain for non-Western users
Abstract
Social media data has been used for detecting users with mental disorders, such as depression. Despite the global significance of cross-cultural representation and its potential impact on model performance, publicly available datasets often lack crucial metadata related to this aspect. In this work, we evaluate the generalization of benchmark datasets to build AI models on cross-cultural Twitter data. We gather a custom geo-located Twitter dataset of depressed users from seven countries as a test dataset. Our results show that depression detection models do not generalize globally. The models perform worse on Global South users compared to Global North. Pre-trained language models achieve the best generalization compared to Logistic Regression, though still show significant gaps in performance on depressed and non-Western users. We quantify our findings and provide several actionable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMental Health via Writing · Sentiment Analysis and Opinion Mining
MethodsLogistic Regression
