Combatting Human Trafficking in the Cyberspace: A Natural Language Processing-Based Methodology to Analyze the Language in Online Advertisements
Alejandro Rodriguez Perez, Pablo Rivas

TL;DR
This paper presents a novel NLP-based methodology utilizing transformer models and interpretability techniques to analyze online advertisements for human trafficking risk, providing explainable insights and a scalable approach for law enforcement.
Contribution
It introduces a minimal supervision pseudo-labeling approach and applies advanced NLP models with interpretability for detecting human trafficking activities online.
Findings
Effective pseudo-labeled dataset generation for trafficking detection
Transformer models achieve high accuracy in risk prediction
Interpretability framework aids law enforcement understanding
Abstract
This project tackles the pressing issue of human trafficking in online C2C marketplaces through advanced Natural Language Processing (NLP) techniques. We introduce a novel methodology for generating pseudo-labeled datasets with minimal supervision, serving as a rich resource for training state-of-the-art NLP models. Focusing on tasks like Human Trafficking Risk Prediction (HTRP) and Organized Activity Detection (OAD), we employ cutting-edge Transformer models for analysis. A key contribution is the implementation of an interpretability framework using Integrated Gradients, providing explainable insights crucial for law enforcement. This work not only fills a critical gap in the literature but also offers a scalable, machine learning-driven approach to combat human exploitation online. It serves as a foundation for future research and practical applications, emphasizing the role of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSex work and related issues · Cybercrime and Law Enforcement Studies
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Dropout · Softmax · Absolute Position Encodings · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing
