Stress Detection on Code-Mixed Texts in Dravidian Languages using Machine Learning
L. Ramos, M. Shahiki-Tash, Z. Ahani, A. Eponon, O. Kolesnikova, H., Calvo

TL;DR
This paper presents a machine learning approach using Random Forest and uncleaned text data to detect stress in code-mixed Tamil and Telugu texts, achieving competitive macro F1-scores and highlighting the importance of data preprocessing.
Contribution
It introduces a novel stress detection method for code-mixed Dravidian languages using uncleaned data and simple textual representations, outperforming complex models.
Findings
Achieved macro F1-score of 0.734 for Tamil and 0.727 for Telugu.
Uncleaned data can be effective for mental state detection.
Simple Random Forest with TF-IDF and n-grams outperforms complex models.
Abstract
Stress is a common feeling in daily life, but it can affect mental well-being in some situations, the development of robust detection models is imperative. This study introduces a methodical approach to the stress identification in code-mixed texts for Dravidian languages. The challenge encompassed two datasets, targeting Tamil and Telugu languages respectively. This proposal underscores the importance of using uncleaned text as a benchmark to refine future classification methodologies, incorporating diverse preprocessing techniques. Random Forest algorithm was used, featuring three textual representations: TF-IDF, Uni-grams of words, and a composite of (1+2+3)-Grams of characters. The approach achieved a good performance for both linguistic categories, achieving a Macro F1-score of 0.734 in Tamil and 0.727 in Telugu, overpassing results achieved with different complex techniques such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsDense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Attention Is All You Need · Label Smoothing · Dropout · fastText · Byte Pair Encoding
