Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

Yinan Zhong (Zhejiang University), Qianhao Miao (Zhejiang University), Yanjiao Chen (Zhejiang University), Jiangyi Deng (Zhejiang University), Yushi Cheng (Zhejiang University), Wenyuan Xu (Zhejiang University)

Large Language Models (LLMs) have been integrated into many applications (e.g., web agents) to perform more sophisticated tasks. However, LLM-empowered applications are vulnerable to Indirect Prompt Injection (IPI) attacks, where instructions are injected via untrustworthy external data sources. This paper presents Rennervate, a defense framework to detect and prevent IPI attacks. Rennervate leverages attention features to detect the covert injection at a fine-grained token level, enabling precise sanitization that neutralizes IPI attacks while maintaining LLM functionalities. Specifically, the token-level detector is materialized with a 2-step attentive pooling mechanism, which aggregates attention heads and response tokens for IPI detection and sanitization. Moreover, we establish a fine-grained IPI dataset, FIPI, to be open-sourced to support further research. Extensive experiments verify that Rennervate outperforms 15 commercial and academic IPI defense methods, achieving high precision on 5 LLMs and 6 datasets. We also demonstrate that Rennervate is transferable to unseen attacks and robust against adaptive adversaries.

Paper

Slides

Video

View More Papers

Lightweight Identity-Based Re-Authentication for Supporting Post-Quantum Security in 5G

Manish Paudel (Advanced Wireless and Security Lab, Virginia Commonwealth University), Maryna Veksler (Advanced Wireless and Security Lab, Virginia Commonwealth University), Kemal Akkaya (Advanced Wireless and Security Lab, Virginia Commonwealth University)

What Are Brands Telling You About Smishing? A Cross-Industry...

Dev Vikesh Doshi (California State University San Marcos), Mehjabeen Tasnim (California State University San Marcos), Fernando Landeros (California State University San Marcos), Chinthagumpala Muni Venkatesh (California State University San Marcos), Daniel Timko (Emerging Threats Lab / Smishtank.com), Muhammad Lutfor Rahman (California State University San Marcos)

Vault Raider: Stealthy UI-based Attacks Against Password Managers in...

Andrea Infantino (University of Illinois Chicago), Mir Masood Ali (University of Illinois Chicago), Kostas Solomos (University of Illinois Chicago), Jason Polakis (University of Illinois Chicago)