On the Challenges of Creating Datasets for Analyzing Commercial Sex Advertisements to Assess Human Trafficking Risk and Organized Activity
Pablo Rivas, Tomas Cerny, Alejandro Rodriguez Perez, Javier Turek,, Laurie Giddens, Gisela Bichler, Stacie Petter

TL;DR
This paper discusses the difficulties in creating datasets for analyzing commercial sex ads related to human trafficking, proposing an automated, reproducible methodology to aid researchers in developing effective datasets for detection.
Contribution
The authors introduce a novel automated and reproducible methodology for building large-scale datasets from commercial sex advertisements, addressing key challenges like data scarcity and privacy.
Findings
Analyzed five million advertisements to demonstrate methodology effectiveness.
Identified key challenges in dataset creation for sensitive domains.
Provided a streamlined process to improve dataset quality and reproducibility.
Abstract
Our study addresses the challenges of building datasets to understand the risks associated with organized activities and human trafficking through commercial sex advertisements. These challenges include data scarcity, rapid obsolescence, and privacy concerns. Traditional approaches, which are not automated and are difficult to reproduce, fall short in addressing these issues. We have developed a reproducible and automated methodology to analyze five million advertisements. In the process, we identified further challenges in dataset creation within this sensitive domain. This paper presents a streamlined methodology to assist researchers in constructing effective datasets for combating organized crime, allowing them to focus on advancing detection technologies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSex work and related issues
MethodsFocus
