Detection of Suicidal Risk on Social Media: A Hybrid Model
Zaihan Yang, Ryan Leonard, Hien Tran, Rory Driscoll, Chadbourne Davis

TL;DR
This paper presents a hybrid machine learning model combining RoBERTa, TF-IDF, and PCA to classify Reddit posts into four levels of suicidal risk, improving accuracy for early detection.
Contribution
It introduces a novel hybrid model integrating deep contextual embeddings with statistical features for multi-class suicide risk classification.
Findings
Hybrid model achieves a weighted F1 score of 0.7512.
Data resampling and augmentation improve model generalization.
Compared with RoBERTa alone and traditional classifiers, the hybrid model performs better.
Abstract
Suicidal thoughts and behaviors are increasingly recognized as a critical societal concern, highlighting the urgent need for effective tools to enable early detection of suicidal risk. In this work, we develop robust machine learning models that leverage Reddit posts to automatically classify them into four distinct levels of suicide risk severity. We frame this as a multi-class classification task and propose a RoBERTa-TF-IDF-PCA Hybrid model, integrating the deep contextual embeddings from Robustly Optimized BERT Approach (RoBERTa), a state-of-the-art deep learning transformer model, with the statistical term-weighting of TF-IDF, further compressed with PCA, to boost the accuracy and reliability of suicide risk assessment. To address data imbalance and overfitting, we explore various data resampling techniques and data augmentation strategies to enhance model generalization.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing
MethodsLinear Layer · Attention Dropout · Softmax · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay · Multi-Head Attention · Attention Is All You Need · Linear Warmup With Linear Decay · Dropout
