MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking   in Escort-Advertisement Data

Vageesh Saxena; Benjamin Bashpole; Gijs Van Dijck; Gerasimos; Spanakis

arXiv:2412.13794·cs.CL·December 19, 2024

MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data

Vageesh Saxena, Benjamin Bashpole, Gijs Van Dijck, Gerasimos, Spanakis

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MATCHED, a multimodal dataset and benchmark for authorship attribution in escort ads, demonstrating that combining text and images improves detection of traffickers and enhances law enforcement capabilities.

Contribution

The study creates a large multimodal dataset and evaluates multimodal models, showing improved performance over text-only or vision-only approaches for identifying traffickers in escort ads.

Findings

01

Multimodal features outperform unimodal baselines in vendor identification.

02

Visual cues provide stylistic information that complements textual analysis.

03

End-to-end multimodal training is more effective than separate or alignment-based methods.

Abstract

Human trafficking (HT) remains a critical issue, with traffickers increasingly leveraging online escort advertisements (ads) to advertise victims anonymously. Existing detection methods, including Authorship Attribution (AA), often center on text-based analyses and neglect the multimodal nature of online escort ads, which typically pair text with images. To address this gap, we introduce MATCHED, a multimodal dataset of 27,619 unique text descriptions and 55,115 unique images collected from the Backpage escort platform across seven U.S. cities in four geographical regions. Our study extensively benchmarks text-only, vision-only, and multimodal baselines for vendor identification and verification tasks, employing multitask (joint) training objectives that achieve superior classification and retrieval performance on in-distribution and out-of-distribution (OOD) datasets. Integrating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vageeshsaxena/matched
pytorchOfficial

Videos

MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data· underline

Taxonomy

TopicsSex work and related issues · Asian Culture and Media Studies

MethodsContrastive Language-Image Pre-training