A Machine Learning Approach for Detection of Mental Health Conditions and Cyberbullying from Social Media
Edward Ajayi, Martha Kachweka, Mawuli Deku, Emily Aiken

TL;DR
This paper develops a multiclass social media detection system for mental health and cyberbullying, demonstrating the effectiveness of domain-adapted transformers like MentalBERT with high accuracy and explainability features.
Contribution
It introduces a unified classification framework with a novel balanced data pipeline, evaluates multiple models including fine-tuned transformers, and presents a practical explainability tool for moderation workflows.
Findings
End-to-end fine-tuning improves detection performance.
MentalBERT outperforms generic models and zero-shot baselines.
The system achieves 0.92 accuracy and 0.76 Macro F1 score.
Abstract
Mental health challenges and cyberbullying are increasingly prevalent in digital spaces, necessitating scalable and interpretable detection systems. This paper introduces a unified multiclass classification framework for detecting ten distinct mental health and cyberbullying categories from social media data. We curate datasets from Twitter and Reddit, implementing a rigorous "split-then-balance" pipeline to train on balanced data while evaluating on a realistic, held-out imbalanced test set. We conducted a comprehensive evaluation comparing traditional lexical models, hybrid approaches, and several end-to-end fine-tuned transformers. Our results demonstrate that end-to-end fine-tuning is critical for performance, with the domain-adapted MentalBERT emerging as the top model, achieving an accuracy of 0.92 and a Macro F1 score of 0.76, surpassing both its generic counterpart and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Mental Health via Writing · Bullying, Victimization, and Aggression
