PhishZip: A New Compression-based Algorithm for Detecting Phishing   Websites

Rizka Purwanto; Arindam Pal; Alan Blair; Sanjay Jha

arXiv:2007.11955·cs.CR·July 24, 2020

PhishZip: A New Compression-based Algorithm for Detecting Phishing Websites

Rizka Purwanto, Arindam Pal, Alan Blair, Sanjay Jha

PDF

TL;DR

PhishZip introduces a novel compression-based method for detecting phishing websites, outperforming previous HTML feature-based approaches by leveraging compression ratios and systematic dictionary construction.

Contribution

The paper presents a new compression algorithm-based phishing detection approach with a systematic dictionary construction method and demonstrates the effectiveness of compression ratios as features.

Findings

01

True positive rate of 80.04% for PhishZip

02

Compression ratios improve detection accuracy by 11.84%

03

Significant increase in true positive rate by 30.3% with new features

Abstract

Phishing has grown significantly in the past few years and is predicted to further increase in the future. The dynamics of phishing introduce challenges in implementing a robust phishing detection system and selecting features which can represent phishing despite the change of attack. In this paper, we propose PhishZip which is a novel phishing detection approach using a compression algorithm to perform website classification and demonstrate a systematic way to construct the word dictionaries for the compression models using word occurrence likelihood analysis. PhishZip outperforms the use of best-performing HTML-based features in past studies, with a true positive rate of 80.04%. We also propose the use of compression ratio as a novel machine learning feature which significantly improves machine learning based phishing detection over previous studies. Using compression ratios as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.