Early Linguistic Pattern of Anxiety from Social Media Using Interpretable Linguistic Features: A Multi-Faceted Validation Study with Author-Disjoint Evaluation

Arnab Das Utsa

arXiv:2601.11758·cs.CL·January 21, 2026

Early Linguistic Pattern of Anxiety from Social Media Using Interpretable Linguistic Features: A Multi-Faceted Validation Study with Author-Disjoint Evaluation

Arnab Das Utsa

PDF

Open Access

TL;DR

This study develops an interpretable, linguistically grounded model for detecting anxiety from social media posts, validated across multiple datasets and demonstrating robustness and early detection capabilities.

Contribution

It introduces a transparent, linguistically interpretable approach for social media-based anxiety detection with rigorous validation and robustness testing.

Findings

01

High accuracy maintained after sentiment removal

02

Early detection significantly outperforms random chance

03

Model generalizes well across domains and aligns with clinical data

Abstract

Anxiety affects hundreds of millions of individuals globally, yet large-scale screening remains limited. Social media language provides an opportunity for scalable detection, but current models often lack interpretability, keyword-robustness validation, and rigorous user-level data integrity. This work presents a transparent approach to social media-based anxiety detection through linguistically interpretable feature-grounded modeling and cross-domain validation. Using a substantial dataset of Reddit posts, we trained a logistic regression classifier on carefully curated subreddits for training, validation, and test splits. Comprehensive evaluation included feature ablation, keyword masking experiments, and varying-density difference analyses comparing anxious and control groups, along with external validation using clinically interviewed participants with diagnosed anxiety disorders.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Sentiment Analysis and Opinion Mining · Digital Mental Health Interventions