An Entity Resolution approach to isolate instances of Human Trafficking online
Chirag Nagpal, Kyle Miller, Benedikt Boecking, Artur Dubrawski

TL;DR
This paper presents an entity resolution pipeline that leverages proxy labels to identify and cluster online instances of human trafficking, addressing challenges of data heterogeneity and noise in law enforcement efforts.
Contribution
The paper introduces a novel entity resolution approach using proxy labels to extract trafficking-related clusters from large, noisy online data sources.
Findings
Successfully applied to 5 million records from backpage.com
Identified domain-specific characteristics of trafficking entities
Discussed scalability challenges and performance metrics
Abstract
Human trafficking is a challenging law enforcement problem, and a large amount of such activity manifests itself on various online forums. Given the large, heterogeneous and noisy structure of this data, building models to predict instances of trafficking is an even more convolved a task. In this paper we propose and entity resolution pipeline using a notion of proxy labels, in order to extract clusters from this data with prior history of human trafficking activity. We apply this pipeline to 5M records from backpage.com and report on the performance of this approach, challenges in terms of scalability, and some significant domain specific characteristics of our resolved entities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
