Kumar Shashwat, Francis Hahn, Xinming Ou, Dmitry Goldgof, Jay Ligatti, Larrence Hall (University of South Florida), S. Raj Rajagoppalan (Resideo), Armin Ziaie Tabari (CipherArmor)

Large language models (LLM) are perceived to offer promising  potentials for automating security tasks, such as those found in security operation centers (SOCs). As a first step towards evaluating this perceived potential, we investigate the use of LLMs in software pentesting, where the main task is to automatically identify software security vulnerabilities in source code. We hypothesize that an LLM-based AI agent can be improved over time for a specific security task as human operators interact with it. Such improvement can be made, as a first step, by engineering prompts fed to the LLM based on the responses produced, to include relevant contexts and structures so that the model provides more accurate results. Such engineering efforts become sustainable if the prompts that are engineered to produce better results on current tasks, also produce better results on future unknown tasks. To examine this hypothesis, we utilize the OWASP Benchmark Project 1.2 which contains 2,740 hand-crafted source code test cases containing various types of vulnerabilities. We divide the test cases into training and testing data, where we engineer the prompts based on the training data (only), and evaluate the final system on the testing data. We compare the AI agent’s performance on the testing data against the performance of the agent without the prompt engineering. We also compare the AI agent’s results against those from SonarQube, a widely used static code analyzer for security testing. We built and tested multiple versions of the AI agent using different off-the-shelf LLMs – Google’s Gemini-pro, as well as OpenAI’s GPT-3.5-Turbo and GPT-4-Turbo (with both chat completion and assistant APIs). The results show that using LLMs is a viable approach to build an AI agent for software pentesting that can improve through repeated use and prompt engineering.

View More Papers

Differentially Private Dataset Condensation

Tianhang Zheng (University of Missouri-Kansas City), Baochun Li (University of Toronto)

Read More

BGP-iSec: Improved Security of Internet Routing Against Post-ROV Attacks

Cameron Morris (University of Connecticut), Amir Herzberg (University of Connecticut), Bing Wang (University of Connecticut), Samuel Secondo (University of Connecticut)

Read More

From Hardware Fingerprint to Access Token: Enhancing the Authentication...

Yue Xiao (Wuhan University), Yi He (Tsinghua University), Xiaoli Zhang (Zhejiang University of Technology), Qian Wang (Wuhan University), Renjie Xie (Tsinghua University), Kun Sun (George Mason University), Ke Xu (Tsinghua University), Qi Li (Tsinghua University)

Read More

HEIR: A Unified Representation for Cross-Scheme Compilation of Fully...

Song Bian (Beihang University), Zian Zhao (Beihang University), Zhou Zhang (Beihang University), Ran Mao (Beihang University), Kohei Suenaga (Kyoto University), Yier Jin (University of Science and Technology of China), Zhenyu Guan (Beihang University), Jianwei Liu (Beihang University)

Read More