A Flexible and Scalable Approach for Collecting Wildlife Advertisements   on the Web

Juliana Barbosa; Sunandan Chakraborty; Juliana Freire

arXiv:2407.18898·cs.IR·July 29, 2024

A Flexible and Scalable Approach for Collecting Wildlife Advertisements on the Web

Juliana Barbosa, Sunandan Chakraborty, Juliana Freire

PDF

Open Access 1 Repo

TL;DR

This paper introduces a scalable data collection pipeline combining web crawling and machine learning to gather and analyze wildlife trafficking ads from online marketplaces, resulting in the largest such dataset to date.

Contribution

It presents a novel scalable approach integrating crawlers and classifiers for wildlife ad collection, enabling large-scale analysis of trafficking networks.

Findings

01

Created a dataset with nearly one million ads from 41 marketplaces

02

Covered 235 species across 20 languages

03

Demonstrated the pipeline's scalability and effectiveness

Abstract

Wildlife traffickers are increasingly carrying out their activities in cyberspace. As they advertise and sell wildlife products in online marketplaces, they leave digital traces of their activity. This creates a new opportunity: by analyzing these traces, we can obtain insights into how trafficking networks work as well as how they can be disrupted. However, collecting such information is difficult. Online marketplaces sell a very large number of products and identifying ads that actually involve wildlife is a complex task that is hard to automate. Furthermore, given that the volume of data is staggering, we need scalable mechanisms to acquire, filter, and store the ads, as well as to make them available for analysis. In this paper, we present a new approach to collect wildlife trafficking data at scale. We propose a data collection pipeline that combines scoped crawlers for data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vida-nyu/wildlife_pipeline
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Marketing and Social Media