A Flexible and Scalable Approach for Collecting Wildlife Advertisements on the Web
Juliana Barbosa, Sunandan Chakraborty, Juliana Freire

TL;DR
This paper introduces a scalable data collection pipeline combining web crawling and machine learning to gather and analyze wildlife trafficking ads from online marketplaces, resulting in the largest such dataset to date.
Contribution
It presents a novel scalable approach integrating crawlers and classifiers for wildlife ad collection, enabling large-scale analysis of trafficking networks.
Findings
Created a dataset with nearly one million ads from 41 marketplaces
Covered 235 species across 20 languages
Demonstrated the pipeline's scalability and effectiveness
Abstract
Wildlife traffickers are increasingly carrying out their activities in cyberspace. As they advertise and sell wildlife products in online marketplaces, they leave digital traces of their activity. This creates a new opportunity: by analyzing these traces, we can obtain insights into how trafficking networks work as well as how they can be disrupted. However, collecting such information is difficult. Online marketplaces sell a very large number of products and identifying ads that actually involve wildlife is a complex task that is hard to automate. Furthermore, given that the volume of data is staggering, we need scalable mechanisms to acquire, filter, and store the ads, as well as to make them available for analysis. In this paper, we present a new approach to collect wildlife trafficking data at scale. We propose a data collection pipeline that combines scoped crawlers for data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Marketing and Social Media
