Mengyuan Sun (Wuhan University), Yu Li (Wuhan University), Yunjie Ge (Wuhan University), Yuchen Liu (Wuhan University), Bo Du (Wuhan University), Qian Wang (Wuhan University)

Multimodal contrastive learning models like CLIP have demonstrated remarkable vision-language alignment capabilities and now serve as foundational components in many large-scale multimodal systems. However, their vulnerability to backdoor attacks poses critical security risks. Attackers can implant latent triggers that persist through downstream tasks, enabling malicious control of model behavior upon trigger presentation. Despite great success in recent defense mechanisms, they remain impractical due to strong assumptions about attacker knowledge or excessive clean data requirements.

In this paper, we introduce InverTune, the first backdoor defense framework for multimodal models under minimal attacker assumptions, requiring neither prior knowledge of attack targets nor access to the poisoned dataset. Unlike existing defense methods that rely on the same dataset used in the poisoning stage, InverTune effectively identifies and removes backdoor artifacts through three key components, achieving robust protection against backdoor attacks. Specifically, (1) InverTune first exposes attack signatures through adversarial simulation, probabilistically identifying the target label by analyzing model response patterns. (2) Building on this, we develop a gradient inversion technique to reconstruct latent triggers through activation pattern analysis. (3) Finally, a clustering-guided fine-tuning strategy is employed to erase the backdoor function with only a small amount of arbitrary clean data, while preserving the original model capabilities. Experimental results show that InverTune reduces the average attack success rate (ASR) by 97.87% against the state-of-the-art (SOTA) attacks while limiting clean accuracy (CA) degradation to just 3.07%.
This work establishes a new paradigm for securing multimodal systems, advancing security in foundation model deployment without compromising performance.

View More Papers

PIRANHAS: PrIvacy-Preserving Remote Attestation in Non-Hierarchical Asynchronous Swarms

Jonas Hofmann (Technische Universität Darmstadt), Philipp-Florens Lehwalder (Technische Universität Darmstadt), Shahriar Ebrahimi (Alan Turing Institute), Parisa Hassanizadeh (IPPT PAN / University of Warwick), Sebastian Faust (Technische Universität Darmstadt)

Read More

Kick Bad Guys Out! Conditionally Activated Anomaly Detection in...

Shanshan Han (University of California, Irvine), Wenxuan Wu (Texas A&M University), Baturalp Buyukates (University of Birmingham), Weizhao Jin (University of Southern California), Qifan Zhang (Palo Alto Networks), Yuhang Yao (Carnegie Mellon University), Salman Avestimehr (University of Southern California)

Read More

The Heat is On: Understanding and Mitigating Vulnerabilities of...

Sri Hrushikesh Varma Bhupathiraju (University of Florida), Shaoyuan Xie (University of California, Irvine), Michael Clifford (Toyota InfoTech Labs), Qi Alfred Chen (University of California, Irvine), Takeshi Sugawara (The University of Electro-Communications), Sara Rampazzi (University of Florida)

Read More