PhishZip: A New Compression-based Algorithm for Detecting Phishing Websites
Rizka Purwanto, Arindam Pal, Alan Blair, Sanjay Jha

TL;DR
PhishZip introduces a novel compression-based method for detecting phishing websites, outperforming previous HTML feature-based approaches by leveraging compression ratios and systematic dictionary construction.
Contribution
The paper presents a new compression algorithm-based phishing detection approach with a systematic dictionary construction method and demonstrates the effectiveness of compression ratios as features.
Findings
True positive rate of 80.04% for PhishZip
Compression ratios improve detection accuracy by 11.84%
Significant increase in true positive rate by 30.3% with new features
Abstract
Phishing has grown significantly in the past few years and is predicted to further increase in the future. The dynamics of phishing introduce challenges in implementing a robust phishing detection system and selecting features which can represent phishing despite the change of attack. In this paper, we propose PhishZip which is a novel phishing detection approach using a compression algorithm to perform website classification and demonstrate a systematic way to construct the word dictionaries for the compression models using word occurrence likelihood analysis. PhishZip outperforms the use of best-performing HTML-based features in past studies, with a true positive rate of 80.04%. We also propose the use of compression ratio as a novel machine learning feature which significantly improves machine learning based phishing detection over previous studies. Using compression ratios as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
