Finding Phish in a Haystack: A Pipeline for Phishing Classification on   Certificate Transparency Logs

Arthur Drichel; Vincent Drury; Justus von Brandt; Ulrike Meyer

arXiv:2106.12343·cs.CR·June 24, 2021

Finding Phish in a Haystack: A Pipeline for Phishing Classification on Certificate Transparency Logs

Arthur Drichel, Vincent Drury, Justus von Brandt, Ulrike Meyer

PDF

1 Repo

TL;DR

This paper introduces a modular pipeline for evaluating phishing detection classifiers using Certificate Transparency logs, aiming to enable earlier attack detection and improve classifier effectiveness.

Contribution

It presents a flexible, open-source pipeline for dataset creation, training, and classification of CT logs, facilitating future research and classifier comparison in phishing detection.

Findings

01

Potential to improve classifiers for phishing detection in CT logs

02

Pipeline supports evaluation on live and past CT log data

03

Open-source tools enable broader research collaboration

Abstract

Current popular phishing prevention techniques mainly utilize reactive blocklists, which leave a ``window of opportunity'' for attackers during which victims are unprotected. One possible approach to shorten this window aims to detect phishing attacks earlier, during website preparation, by monitoring Certificate Transparency (CT) logs. Previous attempts to work with CT log data for phishing classification exist, however they lack evaluations on actual CT log data. In this paper, we present a pipeline that facilitates such evaluations by addressing a number of problems when working with CT log data. The pipeline includes dataset creation, training, and past or live classification of CT logs. Its modular structure makes it possible to easily exchange classifiers or verification sources to support ground truth labeling efforts and classifier comparisons. We test the pipeline on a number…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://gitlab.com/rwth-itsec/ctl-pipeline
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.