A Preliminary Study on Using Large Language Models in Software Pentesting
Kumar Shashwat, Francis Hahn, Xinming Ou, Dmitry Goldgof, Lawrence, Hall, Jay Ligatti, S. Raj Rajgopalan, Armin Ziaie Tabari

TL;DR
This study explores using large language models for automated software vulnerability detection, demonstrating that prompt engineering and iterative interaction can enhance their effectiveness in security testing.
Contribution
It introduces a method for improving LLM-based security testing through prompt engineering and evaluates its effectiveness on a benchmark dataset.
Findings
Prompt engineering improves LLM performance in vulnerability detection.
LLMs outperform traditional static analysis tools like SonarQube.
Performance gains are consistent across different LLMs.
Abstract
Large language models (LLM) are perceived to offer promising potentials for automating security tasks, such as those found in security operation centers (SOCs). As a first step towards evaluating this perceived potential, we investigate the use of LLMs in software pentesting, where the main task is to automatically identify software security vulnerabilities in source code. We hypothesize that an LLM-based AI agent can be improved over time for a specific security task as human operators interact with it. Such improvement can be made, as a first step, by engineering prompts fed to the LLM based on the responses produced, to include relevant contexts and structures so that the model provides more accurate results. Such engineering efforts become sustainable if the prompts that are engineered to produce better results on current tasks, also produce better results on future unknown tasks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Reliability and Analysis Research
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Dense Connections · Linear Warmup With Cosine Annealing · Weight Decay · Dropout · Attention Dropout
