TL;DR
This paper introduces a novel approach using a Siamese CNN to detect homoglyph attacks by learning visual similarity of rendered strings, significantly improving detection accuracy and speed over traditional methods.
Contribution
The study presents a new image-based, deep learning method for homoglyph detection that outperforms existing string comparison algorithms and provides publicly available datasets and code.
Findings
13% to 45% improvement in ROC AUC over baselines
Fast similarity search using KD-Trees
Effective detection of visually similar homoglyphs
Abstract
A homoglyph (name spoofing) attack is a common technique used by adversaries to obfuscate file and domain names. This technique creates process or domain names that are visually similar to legitimate and recognized names. For instance, an attacker may create malware with the name svch0st.exe so that in a visual inspection of running processes or a directory listing, the process or file name might be mistaken as the Windows system process svchost.exe. There has been limited published research on detecting homoglyph attacks. Current approaches rely on string comparison algorithms (such as Levenshtein distance) that result in computationally heavy solutions with a high number of false positives. In addition, there is a deficiency in the number of publicly available datasets for reproducible research, with most datasets focused on phishing attacks, in which homoglyphs are not always used.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
