Model-Agnostic Sentiment Distribution Stability Analysis for Robust LLM-Generated Texts Detection
Siyuan Li, Xi Lin, Guangyan Li, Zehao Liu, Aodu Wulianghai, Li Ding, Jun Wu, Jianhua Li

TL;DR
This paper introduces SentiDetect, a model-agnostic framework that detects AI-generated texts by analyzing sentiment stability, outperforming existing methods especially under adversarial and paraphrased conditions.
Contribution
The paper presents a novel sentiment distribution stability approach for LLM detection, demonstrating improved robustness and generalizability over prior lexical and classifier-based methods.
Findings
SentiDetect outperforms state-of-the-art baselines in F1 scores.
It shows increased robustness to paraphrasing and adversarial attacks.
Effective across diverse datasets and multiple LLMs.
Abstract
The rapid advancement of large language models (LLMs) has resulted in increasingly sophisticated AI-generated content, posing significant challenges in distinguishing LLM-generated text from human-written language. Existing detection methods, primarily based on lexical heuristics or fine-tuned classifiers, often suffer from limited generalizability and are vulnerable to paraphrasing, adversarial perturbations, and cross-domain shifts. In this work, we propose SentiDetect, a model-agnostic framework for detecting LLM-generated text by analyzing the divergence in sentiment distribution stability. Our method is motivated by the empirical observation that LLM outputs tend to exhibit emotionally consistent patterns, whereas human-written texts display greater emotional variability. To capture this phenomenon, we define two complementary metrics: sentiment distribution consistency and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Topic Modeling
