Guanlong Wu (SUSTech), Taojie Wang (SUSTech), Yao Zhang (ByteDance Inc.), Zheng Zhang (SUSTech), Jianyu Niu (SUSTech), Ye Wu (ByteDance Inc.), Yinqian Zhang (SUSTech)

The emergence of large language models (LLMs) has enabled a wide range of applications, including code generation, chatbots, and AI agents. However, deploying these applications faces substantial challenges in terms of cost and efficiency. One notable optimization to address these challenges is semantic caching, which reuses query-response pairs across users based on semantic similarity. This mechanism has gained significant traction in both academia and industry and has been integrated into the LLM serving infrastructure of cloud providers such as Azure, AWS, and Alibaba. This paper is the first to show that semantic caching is vulnerable to cache poisoning attacks, where an attacker injects crafted cache entries to cause others to receive attacker-defined responses. We demonstrate the semantic cache poisoning attack in diverse scenarios and confirm its practicality across all three major public clouds. Building on the attack, we evaluate existing adversarial prompting defenses and find they are ineffective against semantic cache poisoning, leading us to propose a new defense mechanism that demonstrates improved protection compared to existing approaches, though complete mitigation remains challenging. Our study reveals that cache poisoning, a long-standing security concern, has re-emerged in LLM systems. While our analysis focuses on semantic cache, the underlying risks may extend to other types of caching mechanisms used in LLM systems.

View More Papers

Not What It Used To Be: Generational Analysis of...

Janos Szurdi (Palo Alto Networks), Reethika Ramesh (Palo Alto Networks), Ram Sundara Raman (University of California Santa Cruz), Daiping Liu (Palo Alto Networks)

Read More

IsolatOS: Detecting Double Fetch Bugs in COTS RTOS by...

Yingjie Cao (Sun Yat-sen University and The Hong Kong Polytechnic University), Xiaogang Zhu (Adelaide University), Dean Sullivan (University of New Hampshire, US), Haowei Yang, Lei Xue (Sun Yat-sen University), Xian Li (Swinburne University of Technology, Australia), Chenxiong Qian (University of Hong Kong, China), Minrui Yan (Swinburne University of Technology, Australia), Xiapu Luo (The Hong Kong…

Read More

PIRANHAS: PrIvacy-Preserving Remote Attestation in Non-Hierarchical Asynchronous Swarms

Jonas Hofmann (Technical University of Darmstadt), Philipp-Florens Lehwalder (Technical University of Darmstadt), Shahriar Ebrahimi (Alan Turing Institute), Parisa Hassanizadeh (IPPT PAN / University of Warwick), Sebastian Faust (Technical University of Darmstadt)

Read More