Detecting Depression in Thai Blog Posts: a Dataset and a Baseline
Mika H\"am\"al\"ainen, Pattama Patpong, Khalid Alnajjar, Niko, Partanen, Jack Rueter

TL;DR
This paper introduces the first open Thai depression detection dataset, evaluates multiple models achieving 77.53% accuracy with Thai BERT, and provides a baseline for future research.
Contribution
It provides the first annotated Thai depression dataset, benchmarks multiple models, and highlights the need for more diverse Thai language embeddings.
Findings
Thai BERT achieved 77.53% accuracy in depression detection.
The dataset, code, and models are openly available for research.
Current Thai embeddings are limited and need more varied training data.
Abstract
We present the first openly available corpus for detecting depression in Thai. Our corpus is compiled by expert verified cases of depression in several online blogs. We experiment with two different LSTM based models and two different BERT based models. We achieve a 77.53\% accuracy with a Thai BERT model in detecting depression. This establishes a good baseline for future researcher on the same corpus. Furthermore, we identify a need for Thai embeddings that have been trained on a more varied corpus than Wikipedia. Our corpus, code and trained models have been released openly on Zenodo.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Sentiment Analysis and Opinion Mining · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Sigmoid Activation · Attention Dropout · WordPiece · Dropout · Weight Decay · Residual Connection
