Jef Jacobs (DistriNet, KU Leuven), Jorn Lapon (DistriNet, KU Leuven), Vincent Naessens (DistriNet, KU Leuven)

Large Language Models (LLMs) are increasingly used as autonomous agents in domains such as cybersecurity and system administration. The performance of these agents depends heavily on their ability to interact effectively with operating systems, often through Bash commands. Current implementations primarily rely on proprietary cloud-based models, which raise privacy and data confidentiality concerns when deployed in real-world environments. Locally hosted open-source LLMs offer a promising alternative, but their performance for such tasks remains unclear.

This paper presents an empirical evaluation of 22 opensource language models (ranging from 1B to 32B parameters) on Natural Language–to–Bash translation tasks. We introduce an improved scoring system for assessing task success and analyze performance under 10 distinct prompting techniques. Our findings show that Qwen3 models achieve strong results in NL2Bash tasks, that role-play prompting significantly benefits most models, and Chain-of-Thought and RAG can surprisingly hurt local model performance if not carefully designed. We further observe that the impact of prompting strategies varies with model size.

View More Papers

Side-channel Inference of User Activities in AR/VR Using GPU...

Seonghun Son (Iowa State University), Chandrika Mukherjee (Purdue University), Reham Mohamed Aburas (American University of Sharjah), Berk Gulmezoglu (Iowa State University), Z. Berkay Celik (Purdue University)

Read More

FirmAgent: Leveraging Fuzzing to Assist LLM Agents with IoT...

Jiangan Ji (Information Engineering University,Tsinghua University), Chao Zhang (Tsinghua University), Shuitao Gan (Labortory for Advanced Computing and Intelligence Engineering), Lin Jian (Information Engineering University), Hangtian Liu (Information Engineering University), Tieming Liu (Information Engineering University), Lei Zheng (Tsinghua university), Zhipeng Jia (Information Engineering University)

Read More

VICTOR: Dataset Copyright Auditing in Video Recognition Systems

Quan Yuan (Zhejiang University), Zhikun Zhang (Zhejiang University), Linkang Du (Xi'an Jiaotong University), Min Chen (Vrije Universiteit Amsterdam), Mingyang Sun (Peking University), Yunjun Gao (Zhejiang University), Shibo He (Zhejiang University), Jiming Chen (Zhejiang University and Hangzhou Dianzi University)

Read More