Counteracting Dark Web Text-Based CAPTCHA with Generative Adversarial Learning for Proactive Cyber Threat Intelligence
Ning Zhang, Mohammadreza Ebrahimi, Weifeng Li, Hsinchun Chen

TL;DR
This paper introduces DW-GAN, a novel generative adversarial network framework designed to automatically break dark web text-based CAPTCHAs, overcoming background noise and variable character length to facilitate large-scale dark web data collection.
Contribution
The study presents a new GAN-based framework that effectively recognizes complex dark web CAPTCHAs, reducing reliance on human effort and surpassing existing automated methods.
Findings
Achieved over 94.4% success rate on real-world dark web CAPTCHA datasets.
Outperformed state-of-the-art benchmark methods across multiple testbeds.
Effectively countered background noise and variable character length in CAPTCHA images.
Abstract
Automated monitoring of dark web (DW) platforms on a large scale is the first step toward developing proactive Cyber Threat Intelligence (CTI). While there are efficient methods for collecting data from the surface web, large-scale dark web data collection is often hindered by anti-crawling measures. In particular, text-based CAPTCHA serves as the most prevalent and prohibiting type of these measures in the dark web. Text-based CAPTCHA identifies and blocks automated crawlers by forcing the user to enter a combination of hard-to-recognize alphanumeric characters. In the dark web, CAPTCHA images are meticulously designed with additional background noise and variable character length to prevent automated CAPTCHA breaking. Existing automated CAPTCHA breaking methods have difficulties in overcoming these dark web challenges. As such, solving dark web text-based CAPTCHA has been relying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Digital and Cyber Forensics · Cybercrime and Law Enforcement Studies
