Chengkun Wei (Zhejiang University), Wenlong Meng (Zhejiang University), Zhikun Zhang (CISPA Helmholtz Center for Information Security and Stanford University), Min Chen (CISPA Helmholtz Center for Information Security), Minghu Zhao (Zhejiang University), Wenjing Fang (Ant Group), Lei Wang (Ant Group), Zihui Zhang (Zhejiang University), Wenzhi Chen (Zhejiang University)

*Prompt-tuning* has emerged as an attractive paradigm for deploying large-scale language models due to its strong downstream task performance and efficient multitask serving ability. Despite its wide adoption, we empirically show that prompt-tuning is vulnerable to downstream task-agnostic backdoors, which reside in the pretrained models and can affect arbitrary downstream tasks. The state-of-the-art backdoor detection approaches cannot defend against task-agnostic backdoors since they hardly converge in reversing the backdoor triggers. To address this issue, we propose LMSanitator, a novel approach for detecting and removing task-agnostic backdoors on Transformer models. Instead of directly inverting the triggers, LMSanitator aims to invert the *predefined attack vectors* (pretrained models' output when the input is embedded with triggers) of the task-agnostic backdoors, which achieves much better convergence performance and backdoor detection accuracy. LMSanitator further leverages prompt-tuning’s property of freezing the pretrained model to perform accurate and fast output monitoring and input purging during the inference phase. Extensive experiments on multiple language models and NLP tasks illustrate the effectiveness of LMSanitator. For instance, LMSanitator achieves 92.8% backdoor detection accuracy on 960 models and decreases the attack success rate to less than 1% in most scenarios.

View More Papers

Unus pro omnibus: Multi-Client Searchable Encryption via Access Control

Jiafan Wang (Data61, CSIRO), Sherman S. M. Chow (The Chinese University of Hong Kong)

Read More

CAGE: Complementing Arm CCA with GPU Extensions

Chenxu Wang (Southern University of Science and Technology (SUSTech) and The Hong Kong Polytechnic University), Fengwei Zhang (Southern University of Science and Technology (SUSTech)), Yunjie Deng (Southern University of Science and Technology (SUSTech)), Kevin Leach (Vanderbilt University), Jiannong Cao (The Hong Kong Polytechnic University), Zhenyu Ning (Hunan University), Shoumeng Yan (Ant Group), Zhengyu He (Ant…

Read More

Understanding the Implementation and Security Implications of Protective DNS...

Mingxuan Liu (Zhongguancun Laboratory; Tsinghua University), Yiming Zhang (Tsinghua University), Xiang Li (Tsinghua University), Chaoyi Lu (Tsinghua University), Baojun Liu (Tsinghua University), Haixin Duan (Tsinghua University; Zhongguancun Laboratory), Xiaofeng Zheng (Institute for Network Sciences and Cyberspace, Tsinghua University; QiAnXin Technology Research Institute & Legendsec Information Technology (Beijing) Inc.)

Read More

HistCAN: A real-time CAN IDS with enhanced historical traffic...

Shuguo Zhuo, Nuo Li, Kui Ren (The State Key Laboratory of Blockchain and Data Security, Zhejiang University)

Read More