Gelei Deng, Yi Liu (Nanyang Technological University), Yuekang Li (The University of New South Wales), Wang Kailong(Huazhong University of Science and Technology), Tianwei Zhang, Yang Liu (Nanyang Technological University)

Large Language Models (LLMs) have gained immense popularity and are being increasingly applied in various domains. Consequently, ensuring the security of these models is of paramount importance. Jailbreak attacks, which manipulate LLMs to generate malicious content, are recognized as a significant vulnerability. While existing research has predominantly focused on direct jailbreak attacks on LLMs, there has been limited exploration of indirect methods. The integration of various plugins into LLMs, notably Retrieval Augmented Generation (RAG), which enables LLMs to incorporate external knowledge bases into their response generation such as GPTs, introduces new avenues for indirect jailbreak attacks.

To fill this gap, we investigate indirect jailbreak attacks on LLMs, particularly GPTs, introducing a novel attack vector named Retrieval Augmented Generation Poisoning. This method, PANDORA, exploits the synergy between LLMs and RAG through prompt manipulation to generate unexpected responses. PANDORA uses maliciously crafted content to influence the RAG process, effectively initiating jailbreak attacks. Our preliminary tests show that PANDORA successfully conducts jailbreak attacks in four different scenarios, achieving higher success rates than direct attacks, with 64.3% for GPT-3.5 and 34.8% for GPT-4.

View More Papers

Predictive Context-sensitive Fuzzing

Pietro Borrello (Sapienza University of Rome), Andrea Fioraldi (EURECOM), Daniele Cono D'Elia (Sapienza University of Rome), Davide Balzarotti (Eurecom), Leonardo Querzoni (Sapienza University of Rome), Cristiano Giuffrida (Vrije Universiteit Amsterdam)

Read More

SyzBridge: Bridging the Gap in Exploitability Assessment of Linux...

Xiaochen Zou (UC Riverside), Yu Hao (UC Riverside), Zheng Zhang (UC RIverside), Juefei Pu (UC RIverside), Weiteng Chen (Microsoft Research, Redmond), Zhiyun Qian (UC Riverside)

Read More

AAKA: An Anti-Tracking Cellular Authentication Scheme Leveraging Anonymous Credentials

Hexuan Yu (Virginia Polytechnic Institute and State University), Changlai Du (Virginia Polytechnic Institute and State University), Yang Xiao (University of Kentucky), Angelos Keromytis (Georgia Institute of Technology), Chonggang Wang (InterDigital), Robert Gazda (InterDigital), Y. Thomas Hou (Virginia Polytechnic Institute and State University), Wenjing Lou (Virginia Polytechnic Institute and State University)

Read More

AdvCAPTCHA: Creating Usable and Secure Audio CAPTCHA with Adversarial...

Hao-Ping (Hank) Lee (Carnegie Mellon University), Wei-Lun Kao (National Taiwan University), Hung-Jui Wang (National Taiwan University), Ruei-Che Chang (University of Michigan), Yi-Hao Peng (Carnegie Mellon University), Fu-Yin Cherng (National Chung Cheng University), Shang-Tse Chen (National Taiwan University)

Read More