PhishGAN: Data Augmentation and Identification of Homoglpyh Attacks

Joon Sern Lee; Gui Peng David Yam; Jin Hao Chan

arXiv:2006.13742·cs.CV·January 11, 2021

PhishGAN: Data Augmentation and Identification of Homoglpyh Attacks

Joon Sern Lee, Gui Peng David Yam, Jin Hao Chan

PDF

TL;DR

This paper introduces PhishGAN, a GAN-based method for generating homoglyph images to improve phishing detection, addressing dataset scarcity and computational challenges in current state-of-the-art approaches.

Contribution

The paper presents a novel GAN model, PhishGAN, for generating diverse homoglyph images and a workflow combining it with a homoglyph identifier to enhance phishing detection capabilities.

Findings

01

PhishGAN can generate varied homoglyph images conditioned on input text.

02

The combined workflow improves detection of homoglyph-based phishing attacks.

03

Dataset generation on demand enables rapid adaptation to new threats.

Abstract

Homoglyph attacks are a common technique used by hackers to conduct phishing. Domain names or links that are visually similar to actual ones are created via punycode to obfuscate the attack, making the victim more susceptible to phishing. For example, victims may mistake "|inkedin.com" for "linkedin.com" and in the process, divulge personal details to the fake website. Current State of The Art (SOTA) typically make use of string comparison algorithms (e.g. Levenshtein Distance), which are computationally heavy. One reason for this is the lack of publicly available datasets thus hindering the training of more advanced Machine Learning (ML) models. Furthermore, no one font is able to render all types of punycode correctly, posing a significant challenge to the creation of a dataset that is unbiased toward any particular font. This coupled with the vast number of internet domains pose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.