CRATOR: a Dark Web Crawler
Daniel De Pascale, Giuseppe Cascavilla, Damian A. Tamburri, Willem-Jan, Van Den Heuvel

TL;DR
This paper presents CRATOR, a dark web crawler capable of efficiently extracting pages with security protocols while maintaining anonymity, useful for cybersecurity and threat intelligence applications.
Contribution
The study introduces a novel dark web crawler that handles security measures like captchas and employs techniques for anonymity and detection avoidance.
Findings
Effective extraction of pages with security protocols
Maintains anonymity through user-agent rotation and proxies
Demonstrates high coverage and robustness
Abstract
Dark web crawling is a complex process that involves specific methodologies and techniques to navigate the Tor network and extract data from hidden services. This study proposes a general dark web crawler designed to extract pages handling security protocols, such as captchas, efficiently. Our approach uses a combination of seed URL lists, link analysis, and scanning to discover new content. We also incorporate methods for user-agent rotation and proxy usage to maintain anonymity and avoid detection. We evaluate the effectiveness of our crawler using metrics such as coverage, performance and robustness. Our results demonstrate that our crawler effectively extracts pages handling security protocols while maintaining anonymity and avoiding detection. Our proposed dark web crawler can be used for various applications, including threat intelligence, cybersecurity, and online investigations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Advanced Malware Detection Techniques
