PhishGAN: Data Augmentation and Identification of Homoglpyh Attacks
Joon Sern Lee, Gui Peng David Yam, Jin Hao Chan

TL;DR
This paper introduces PhishGAN, a GAN-based method for generating homoglyph images to improve phishing detection, addressing dataset scarcity and computational challenges in current state-of-the-art approaches.
Contribution
The paper presents a novel GAN model, PhishGAN, for generating diverse homoglyph images and a workflow combining it with a homoglyph identifier to enhance phishing detection capabilities.
Findings
PhishGAN can generate varied homoglyph images conditioned on input text.
The combined workflow improves detection of homoglyph-based phishing attacks.
Dataset generation on demand enables rapid adaptation to new threats.
Abstract
Homoglyph attacks are a common technique used by hackers to conduct phishing. Domain names or links that are visually similar to actual ones are created via punycode to obfuscate the attack, making the victim more susceptible to phishing. For example, victims may mistake "|inkedin.com" for "linkedin.com" and in the process, divulge personal details to the fake website. Current State of The Art (SOTA) typically make use of string comparison algorithms (e.g. Levenshtein Distance), which are computationally heavy. One reason for this is the lack of publicly available datasets thus hindering the training of more advanced Machine Learning (ML) models. Furthermore, no one font is able to render all types of punycode correctly, posing a significant challenge to the creation of a dataset that is unbiased toward any particular font. This coupled with the vast number of internet domains pose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
