MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data
Vageesh Saxena, Benjamin Bashpole, Gijs Van Dijck, Gerasimos, Spanakis

TL;DR
This paper introduces MATCHED, a multimodal dataset and benchmark for authorship attribution in escort ads, demonstrating that combining text and images improves detection of traffickers and enhances law enforcement capabilities.
Contribution
The study creates a large multimodal dataset and evaluates multimodal models, showing improved performance over text-only or vision-only approaches for identifying traffickers in escort ads.
Findings
Multimodal features outperform unimodal baselines in vendor identification.
Visual cues provide stylistic information that complements textual analysis.
End-to-end multimodal training is more effective than separate or alignment-based methods.
Abstract
Human trafficking (HT) remains a critical issue, with traffickers increasingly leveraging online escort advertisements (ads) to advertise victims anonymously. Existing detection methods, including Authorship Attribution (AA), often center on text-based analyses and neglect the multimodal nature of online escort ads, which typically pair text with images. To address this gap, we introduce MATCHED, a multimodal dataset of 27,619 unique text descriptions and 55,115 unique images collected from the Backpage escort platform across seven U.S. cities in four geographical regions. Our study extensively benchmarks text-only, vision-only, and multimodal baselines for vendor identification and verification tasks, employing multitask (joint) training objectives that achieve superior classification and retrieval performance on in-distribution and out-of-distribution (OOD) datasets. Integrating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSex work and related issues · Asian Culture and Media Studies
MethodsContrastive Language-Image Pre-training
