Gelei Deng (Nanyang Technological University), Yi Liu (Nanyang Technological University), Yuekang Li (University of New South Wales), Kailong Wang (Huazhong University of Science and Technology), Ying Zhang (Virginia Tech), Zefeng Li (Nanyang Technological University), Haoyu Wang (Huazhong University of Science and Technology), Tianwei Zhang (Nanyang Technological University), Yang Liu (Nanyang Technological University)

Large language models (LLMs), such as chatbots, have made significant strides in various fields but remain vulnerable to jailbreak attacks, which aim to elicit inappropriate responses. Despite efforts to identify these weaknesses, current strategies are ineffective against mainstream LLM chatbots, mainly due to undisclosed defensive measures by service providers. Our paper introduces MASTERKEY, a framework exploring the dynamics of jailbreak attacks and countermeasures. We present a novel method based on time-based characteristics to dissect LLM chatbot defenses. This technique, inspired by time-based SQL injection, uncovers the workings of these defenses and demonstrates a proof-of-concept attack on several LLM chatbots.

Additionally, MASTERKEY features an innovative approach for automatically generating jailbreak prompts that target well-defended LLM chatbots. By fine-tuning an LLM with jailbreak prompts, we create attacks with a 21.58% success rate, significantly higher than the 7.33% achieved by existing methods. We have informed service providers of these findings, highlighting the urgent need for stronger defenses. This work not only reveals vulnerabilities in LLMs but also underscores the importance of robust defenses against such attacks.

View More Papers

CAGE: Complementing Arm CCA with GPU Extensions

Chenxu Wang (Southern University of Science and Technology (SUSTech) and The Hong Kong Polytechnic University), Fengwei Zhang (Southern University of Science and Technology (SUSTech)), Yunjie Deng (Southern University of Science and Technology (SUSTech)), Kevin Leach (Vanderbilt University), Jiannong Cao (The Hong Kong Polytechnic University), Zhenyu Ning (Hunan University), Shoumeng Yan (Ant Group), Zhengyu He (Ant…

Read More

The Impact of Workload on Phishing Susceptibility: An Experiment

Sijie Zhuo (University of Auckland), Robert Biddle (University of Auckland and Carleton University, Ottawa), Lucas Betts, Nalin Asanka Gamagedara Arachchilage, Yun Sing Koh, Danielle Lottridge, Giovanni Russello (University of Auckland)

Read More

CAN-MIRGU: A Comprehensive CAN Bus Attack Dataset from Moving...

Sampath Rajapaksha, Harsha Kalutarage (Robert Gordon University, UK), Garikayi Madzudzo (Horiba Mira Ltd, UK), Andrei Petrovski (Robert Gordon University, UK), M.Omar Al-Kadri (University of Doha for Science and Technology)

Read More

Measuring the Prevalence of Password Manager Issues Using In-Situ...

Adryana Hutchinson (The George Washington University), Jinwei Tang (Clark University), Adam Aviv (The George Washington University), Peter Story (Clark University)

Read More