Marius Vangeli (KTH Royal Institute of Technology, Sweden), Joel Brynielsson (KTH Royal Institute of Technology, Sweden and FOI Swedish Defence Research Agency, Sweden), Mika Cohen (KTH Royal Institute of Technology, Sweden and FOI Swedish Defence Research Agency, Sweden), Farzad Kamrani (FOI Swedish Defence Research Agency, Sweden)

While large language model (LLM)-driven penetration testing is rapidly improving, autonomous agents still struggle with longer-duration multi-stage exploits. As agents perform reconnaissance, attempt exploits, and pivot through systems, the token context window fills up with exploration and failed attempts, degrading decision quality. We introduce context handoff for autonomous penetration testing (CHAP), a context-relay system for LLM-driven agents. CHAP enables agents to sustain long-running penetration tests by transferring accumulated knowledge as compact protocols to fresh agent instances.

We evaluate CHAP on an extended version of the AutoPen- Bench benchmark, targeting 11 real-world vulnerabilities. CHAP improved per-run success from 27.3% to 36.4% while reducing token expenditure by 32.4% compared to a baseline agent. We release our full implementation, benchmark enhancements, and a dataset of command logs with LLM reasoning traces.

View More Papers

Convergent Privacy Framework for Multi-layer GNNs through Contractive Message...

Yu Zheng (University of California, Irvine), Chenang Li (University of California, Irvine), Zhou Li (University of California, Irvine), Qingsong Wang (University of California, San Diego)

Read More

Breaking the Generative Steganography Trilemma: ANStega for Optimal Capacity,...

Yaofei Wang (Hefei University of Technology), Weilong Pang (Hefei University of Technology), Kejiang Chen (University of Science and Technology of China), Jinyang Ding (University of Science and Technology of China), Donghui Hu (Hefei University of Technology), Weiming Zhang (University of Science and Technology of China), Nenghai Yu (University of Science and Technology of China)

Read More

An LLM-Driven Fuzzing Framework for Detecting Logic Instruction Bugs...

Jiaxing Cheng (Institute of Information Engineering, CAS; SCS, UCAS Beijing, China), Ming Zhou (SCS, Nanjing University of Science and Technology Nanjing, Jiangsu, China), Haining Wang (ECE Virginia Tech Arlington, VA, USA), Xin Chen (Institute of Information Engineering, CAS; SCS, UCAS Beijing, China), Yuncheng Wang (Institute of Information Engineering CAS; SCS, UCAS Beijing, China), Yibo Qu…

Read More