A Survey of Relevant Text Mining Technology
Claudia Peersman, Matthew Edwards, Emma Williams, Awais Rashid

TL;DR
This survey reviews current text mining techniques for social media and cybercriminal analysis, highlighting challenges like data variability, noise, and deception, and discusses future research directions.
Contribution
It provides a comprehensive overview of existing methods and identifies gaps in addressing domain-specific challenges in social media text mining.
Findings
Current methods address some challenges but lack robustness against noise and deception.
Identifies key areas needing further research, such as adversarial behaviour detection.
Highlights the importance of domain-specific adaptations in text mining.
Abstract
Recent advances in text mining and natural language processing technology have enabled researchers to detect an authors identity or demographic characteristics, such as age and gender, in several text genres by automatically analysing the variation of linguistic characteristics. However, applying such techniques in the wild, i.e., in both cybercriminal and regular online social media, differs from more general applications in that its defining characteristics are both domain and process dependent. This gives rise to a number of challenges of which contemporary research has only scratched the surface. More specifically, a text mining approach applied on social media communications typically has no control over the dataset size, the number of available communications will vary across users. Hence, the system has to be robust towards limited data availability. Additionally, the quality of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Spam and Phishing Detection · Hate Speech and Cyberbullying Detection
