Detecting Depression in Thai Blog Posts: a Dataset and a Baseline

Mika H\"am\"al\"ainen; Pattama Patpong; Khalid Alnajjar; Niko; Partanen; Jack Rueter

arXiv:2111.04574·cs.CL·November 9, 2021

Detecting Depression in Thai Blog Posts: a Dataset and a Baseline

Mika H\"am\"al\"ainen, Pattama Patpong, Khalid Alnajjar, Niko, Partanen, Jack Rueter

PDF

Open Access

TL;DR

This paper introduces the first open Thai depression detection dataset, evaluates multiple models achieving 77.53% accuracy with Thai BERT, and provides a baseline for future research.

Contribution

It provides the first annotated Thai depression dataset, benchmarks multiple models, and highlights the need for more diverse Thai language embeddings.

Findings

01

Thai BERT achieved 77.53% accuracy in depression detection.

02

The dataset, code, and models are openly available for research.

03

Current Thai embeddings are limited and need more varied training data.

Abstract

We present the first openly available corpus for detecting depression in Thai. Our corpus is compiled by expert verified cases of depression in several online blogs. We experiment with two different LSTM based models and two different BERT based models. We achieve a 77.53\% accuracy with a Thai BERT model in detecting depression. This establishes a good baseline for future researcher on the same corpus. Furthermore, we identify a need for Thai embeddings that have been trained on a more varied corpus than Wikipedia. Our corpus, code and trained models have been released openly on Zenodo.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Sentiment Analysis and Opinion Mining · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Sigmoid Activation · Attention Dropout · WordPiece · Dropout · Weight Decay · Residual Connection