A Preliminary Study on Using Large Language Models in Software   Pentesting

Kumar Shashwat; Francis Hahn; Xinming Ou; Dmitry Goldgof; Lawrence; Hall; Jay Ligatti; S. Raj Rajgopalan; Armin Ziaie Tabari

arXiv:2401.17459·cs.CR·February 1, 2024·1 cites

A Preliminary Study on Using Large Language Models in Software Pentesting

Kumar Shashwat, Francis Hahn, Xinming Ou, Dmitry Goldgof, Lawrence, Hall, Jay Ligatti, S. Raj Rajgopalan, Armin Ziaie Tabari

PDF

Open Access

TL;DR

This study explores using large language models for automated software vulnerability detection, demonstrating that prompt engineering and iterative interaction can enhance their effectiveness in security testing.

Contribution

It introduces a method for improving LLM-based security testing through prompt engineering and evaluates its effectiveness on a benchmark dataset.

Findings

01

Prompt engineering improves LLM performance in vulnerability detection.

02

LLMs outperform traditional static analysis tools like SonarQube.

03

Performance gains are consistent across different LLMs.

Abstract

Large language models (LLM) are perceived to offer promising potentials for automating security tasks, such as those found in security operation centers (SOCs). As a first step towards evaluating this perceived potential, we investigate the use of LLMs in software pentesting, where the main task is to automatically identify software security vulnerabilities in source code. We hypothesize that an LLM-based AI agent can be improved over time for a specific security task as human operators interact with it. Such improvement can be made, as a first step, by engineering prompts fed to the LLM based on the responses produced, to include relevant contexts and structures so that the model provides more accurate results. Such engineering efforts become sustainable if the prompts that are engineered to produce better results on current tasks, also produce better results on future unknown tasks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Reliability and Analysis Research

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Dense Connections · Linear Warmup With Cosine Annealing · Weight Decay · Dropout · Attention Dropout