Cache Me, Catch You: Cache Related Security Threats in LLM Serving Frameworks

XiangFan Wu (Ocean University of China; QI-ANXIN Technology Research Institute), Lingyun Ying (QI-ANXIN Technology Research Institute), Guoqiang Chen (QI-ANXIN Technology Research Institute), Yacong Gu (Tsinghua University; Tsinghua University-QI-ANXIN Group JCNS), Haipeng Qu (Department of Computer Science and Technology, Ocean University of China)

Large Language Models (LLMs) are rapidly reshaping digital interactions. Their performance and efficiency are critically dependent on advanced caching mechanisms, such as prefix caching and semantic caching.
However, these mechanisms introduce a new attack surface. Unlike prior work focused on LLMs poisoning attacks during the training phase, this paper presents the first comprehensive investigation into cache-related security risks that arise during the LLM inference-time.

We conducted a systematic study of the cache implementations in mainstream LLM serving frameworks and then identified six novel attack vectors categorized as: (1) User-oriented Fraud Attacks, which manipulate cache entries to deliver malicious content to users via prefix cache collisions and semantic fuzzy poisoning; and (2) System Integrity Attacks, which exploit cache vulnerabilities to bypass security checks, such as using block-wise or multimodal collisions to evade content moderation.
Our experiments on leading open-source frameworks validated these attack vectors and evaluated their impact and cost.
Furthermore, we proposed five multilayer defense strategies and assessed their effectiveness.
We responsibly disclosed our findings to affected vendors, including vLLM, SGLang, GPTCache, AIBrix, rtp-llm and LMDeploy. All of them have acknowledged the vulnerabilities, and notably, vLLM, GPTCache, and AIBrix have adopted our proposed mitigation methods and fixed their vulnerabilities.
Our findings underscore the importance of secure the caching infrastructure in the rapidly expanding LLM ecosystem.

Paper

View More Papers

WiFinger: Fingerprinting Noisy IoT Event Traffic Using Packet-level Sequence...

Ronghua Li (The Hong Kong Polytechnic University), Shinan Liu (The University of Hong Kong), Haibo Hu (The Hong Kong Polytechnic University, PolyU Research Centre for Privacy and Security Technologies in Future Smart Systems), Qingqing Ye (The Hong Kong Polytechnic University), Nick Feamster (University of Chicago)

The 1-RTT Penalty: Quantifying the Recurring Cost of PQC...

Young Eun Kwon, Ji Won Yoon (Korea University)

Abuse Resistant Traceability with Minimal Trust for Encrypted Messaging...

Zhongming Wang (Chongqing University), Tao Xiang (Chongqing University), Xiaoguo Li (Chongqing University), Guomin Yang (Singapore Management University), Biwen Chen (Chongqing University), Ze Jiang (Chongqing University), Jiacheng Wang (Nanyang Technological University), Chuan Ma (Chongqing University), Robert H. Deng (Singapore Management University)