Su-RoBERTa: A Semi-supervised Approach to Predicting Suicide Risk   through Social Media using Base Language Models

Chayan Tank; Shaina Mehta; Sarthak Pol; Vinayak Katoch; Avinash Anand,; Raj Jaiswal; Rajiv Ratn Shah

arXiv:2412.01353·cs.HC·December 20, 2024

Su-RoBERTa: A Semi-supervised Approach to Predicting Suicide Risk through Social Media using Base Language Models

Chayan Tank, Shaina Mehta, Sarthak Pol, Vinayak Katoch, Avinash Anand,, Raj Jaiswal, Rajiv Ratn Shah

PDF

Open Access

TL;DR

This paper introduces Su-RoBERTa, a semi-supervised model fine-tuned on social media data, demonstrating that smaller language models can effectively predict suicide risk with an efficient approach.

Contribution

The study presents Su-RoBERTa, a novel semi-supervised fine-tuning method using base language models and data augmentation for suicide risk prediction from social media.

Findings

01

Su-RoBERTa achieved a 69.84% weighted F1 score.

02

Smaller language models (<500M parameters) are effective for this task.

03

Data augmentation with GPT-2 improves class imbalance handling.

Abstract

In recent times, more and more people are posting about their mental states across various social media platforms. Leveraging this data, AI-based systems can be developed that help in assessing the mental health of individuals, such as suicide risk. This paper is a study done on suicidal risk assessments using Reddit data leveraging Base language models to identify patterns from social media posts. We have demonstrated that using smaller language models, i.e., less than 500M parameters, can also be effective in contrast to LLMs with greater than 500M parameters. We propose Su-RoBERTa, a fine-tuned RoBERTa on suicide risk prediction task that utilized both the labeled and unlabeled Reddit data and tackled class imbalance by data augmentation using GPT-2 model. Our Su-RoBERTa model attained a 69.84% weighted F1 score during the Final evaluation. This paper demonstrates the effectiveness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · WordPiece · Dense Connections · Layer Normalization · Linear Layer · Discriminative Fine-Tuning · Weight Decay · Attention Dropout