XiangFan Wu (Ocean University of China; QI-ANXIN Technology Research Institute), Lingyun Ying (QI-ANXIN Technology Research Institute), Guoqiang Chen (QI-ANXIN Technology Research Institute), Yacong Gu (Tsinghua University; Tsinghua University-QI-ANXIN Group JCNS), Haipeng Qu (Department of Computer Science and Technology, Ocean University of China)

Large Language Models (LLMs) are rapidly reshaping digital interactions. Their performance and efficiency are critically dependent on advanced caching mechanisms, such as prefix caching and semantic caching.
However, these mechanisms introduce a new attack surface. Unlike prior work focused on LLMs poisoning attacks during the training phase, this paper presents the first comprehensive investigation into cache-related security risks that arise during the LLM inference-time.

We conducted a systematic study of the cache implementations in mainstream LLM serving frameworks and then identified six novel attack vectors categorized as: (1) User-oriented Fraud Attacks, which manipulate cache entries to deliver malicious content to users via prefix cache collisions and semantic fuzzy poisoning; and (2) System Integrity Attacks, which exploit cache vulnerabilities to bypass security checks, such as using block-wise or multimodal collisions to evade content moderation.
Our experiments on leading open-source frameworks validated these attack vectors and evaluated their impact and cost.
Furthermore, we proposed five multilayer defense strategies and assessed their effectiveness.
We responsibly disclosed our findings to affected vendors, including vLLM, SGLang, GPTCache, AIBrix, rtp-llm and LMDeploy. All of them have acknowledged the vulnerabilities, and notably, vLLM, GPTCache, and AIBrix have adopted our proposed mitigation methods and fixed their vulnerabilities.
Our findings underscore the importance of secure the caching infrastructure in the rapidly expanding LLM ecosystem.

View More Papers

Memory Band-Aid: A Principled Rowhammer Defense-in-Depth

Carina Fiedler (Graz University of Technology), Jonas Juffinger (Graz University of Technology), Sudheendra Raghav Neela (Graz University of Technology), Martin Heckel (Hof University of Applied Sciences), Hannes Weissteiner (Graz University of Technology), Abdullah Giray Yağlıkçı (ETH Zürich), Florian Adamsky (Hof University of Applied Sciences), Daniel Gruss (Graz University of Technology)

Read More

Work-in-progress: Uncovering the Invisible: A Large-Scale Analysis of Service...

Sivakanesan Dhanushkanda (Old Dominion University), Mustafa Ibrahim (Old Dominion University), Shuai Hao (Old Dominion University)

Read More

From Matrix to Metrics: Introducing and Applying a Configuration...

Tobias Länge (SECUSO, Karlsruhe Institute of Technology, Karlsruhe, Germany), Fabian Lucas Ballreich (SECUSO, Karlsruhe Institute of Technology, Karlsruhe, Germany), Anne Hennig (SECUSO, Karlsruhe Institute of Technology, Karlsruhe, Germany), Peter Mayer (SECUSO, Karlsruhe Institute of Technology, Karlsruhe, Germany), Melanie Volkamer (SECUSO, Karlsruhe Institute of Technology, Karlsruhe, Germany)

Read More