Leveraging AI to optimize website structure discovery during Penetration Testing
Diego Antonelli, Roberta Cascella, Gaetano Perrone, Simon Pietro, Romano, Antonio Schiano

TL;DR
This paper introduces an AI-driven method using semantic clustering to optimize directory discovery in web penetration testing, significantly reducing resource use and increasing efficiency.
Contribution
It presents a novel AI-based approach employing semantic clustering and next-word prediction to enhance dirbusting, outperforming traditional brute-force techniques.
Findings
Up to 50% performance improvement in directory discovery
Semantic clustering outperforms brute force methods
Effective across multiple web applications
Abstract
Dirbusting is a technique used to brute force directories and file names on web servers while monitoring HTTP responses, in order to enumerate server contents. Such a technique uses lists of common words to discover the hidden structure of the target website. Dirbusting typically relies on response codes as discovery conditions to find new pages. It is widely used in web application penetration testing, an activity that allows companies to detect websites vulnerabilities. Dirbusting techniques are both time and resource consuming and innovative approaches have never been explored in this field. We hence propose an advanced technique to optimize the dirbusting process by leveraging Artificial Intelligence. More specifically, we use semantic clustering techniques in order to organize wordlist items in different groups according to their semantic meaning. The created clusters are used in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Web Application Security Vulnerabilities · Software Engineering Research
