Identifying Products in Online Cybercrime Marketplaces: A Dataset for   Fine-grained Domain Adaptation

Greg Durrett; Jonathan K. Kummerfeld; Taylor Berg-Kirkpatrick; Rebecca; S. Portnoff; Sadia Afroz; Damon McCoy; Kirill Levchenko; Vern Paxson

arXiv:1708.09609·cs.CL·June 5, 2020

Identifying Products in Online Cybercrime Marketplaces: A Dataset for Fine-grained Domain Adaptation

Greg Durrett, Jonathan K. Kummerfeld, Taylor Berg-Kirkpatrick, Rebecca, S. Portnoff, Sadia Afroz, Damon McCoy, Kirill Levchenko, Vern Paxson

PDF

1 Repo

TL;DR

This paper introduces a dataset and analysis for identifying products in online cybercrime forums, highlighting cross-domain challenges and the limited effectiveness of existing domain adaptation techniques.

Contribution

It provides a new annotated dataset from four cybercrime forums and analyzes the challenges of domain adaptation in product identification tasks.

Findings

01

Supervised models perform poorly across different forums.

02

Standard semi-supervised and domain adaptation techniques have limited success.

03

The dataset enables future research on cross-domain cybercrime NLP tasks.

Abstract

One weakness of machine-learned NLP models is that they typically perform poorly on out-of-domain data. In this work, we study the task of identifying products being bought and sold in online cybercrime forums, which exhibits particularly challenging cross-domain effects. We formulate a task that represents a hybrid of slot-filling information extraction and named entity recognition and annotate data from four different forums. Each of these forums constitutes its own "fine-grained domain" in that the forums cover different market sectors with different properties, even though all forums are in the broad domain of cybercrime. We characterize these domain differences in the context of a learning-based system: supervised models see decreased accuracy when applied to new forums, and standard techniques for semi-supervised learning and domain adaptation have limited effectiveness on this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ccied/ugforum-analysis
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.