Text Analysis in Adversarial Settings: Does Deception Leave a Stylistic Trace?
Tommi Gr\"ondahl, N. Asokan

TL;DR
This paper reviews whether deception leaves stylistic traces in text, finding that stylistic features often fail to generalize across domains, and suggests similarity measures as better classifiers, while highlighting challenges in style obfuscation techniques.
Contribution
The paper provides a comprehensive review of linguistic features for deception detection and evaluates the effectiveness of style obfuscation methods, proposing future research directions.
Findings
Linguistic features often do not generalize across domains.
Text similarity measures outperform stylistic features in deception detection.
Current style obfuscation methods are unreliable and alter semantics.
Abstract
Textual deception constitutes a major problem for online security. Many studies have argued that deceptiveness leaves traces in writing style, which could be detected using text classification techniques. By conducting an extensive literature review of existing empirical work, we demonstrate that while certain linguistic features have been indicative of deception in certain corpora, they fail to generalize across divergent semantic domains. We suggest that deceptiveness as such leaves no content-invariant stylistic trace, and textual similarity measures provide superior means of classifying texts as potentially deceptive. Additionally, we discuss forms of deception beyond semantic content, focusing on hiding author identity by writing style obfuscation. Surveying the literature on both author identification and obfuscation techniques, we conclude that current style transformation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Spam and Phishing Detection
