Xiaoyun xu (Radboud University), Shujian Yu (Vrije Universiteit Amsterdam), Zhuoran Liu (Radboud University), Stjepan Picek (Radboud University)

Vision Transformers (ViTs) have emerged as a fundamental architecture and serve as the backbone of modern vision-language models. Despite their impressive performance, ViTs exhibit notable vulnerability to evasion attacks, necessitating the development of specialized Adversarial Training (AT) strategies tailored to their unique architecture.
While a direct solution might involve applying existing AT methods to ViTs, our analysis reveals significant incompatibilities, particularly with state-of-the-art (SOTA) approaches such as Generalist (CVPR 2023) and DBAT (USENIX Security 2024).
This paper presents a systematic investigation of adversarial robustness in ViTs and provides a novel theoretical Mutual Information (MI) analysis in its autoencoder-based self-supervised pre-training.
Specifically, we show that MI between the adversarial example and its latent representation in ViT-based autoencoders should be constrained via derived MI bounds.
Building on this insight, we propose a self-supervised AT method, MIMIR, that employs an MI penalty to facilitate adversarial pre-training by masked image modeling with autoencoders.
Extensive experiments on CIFAR-10, Tiny-ImageNet, and ImageNet-1K show that MIMIR can consistently provide improved natural and robust accuracy, where MIMIR outperforms SOTA AT results on ImageNet-1K.
Notably, MIMIR demonstrates superior robustness against unforeseen attacks and common corruption data and can also withstand adaptive attacks where the adversary possesses full knowledge of the defense mechanism.
Our code and trained models are publicly available at: https://github.com/xiaoyunxxy/MIMIR.

View More Papers

VR ProfiLens: User Profiling Risks in Consumer Virtual Reality...

Ismat Jarin (University of California, Irvine), Olivia Figueira (University of California, Irvine), Yu Duan (University of California, Irvine), Tu Le (The University of Alabama), Athina Markopoulou (University of California, Irvine)

Read More

Memory Band-Aid: A Principled Rowhammer Defense-in-Depth

Carina Fiedler (Graz University of Technology), Jonas Juffinger (Graz University of Technology), Sudheendra Raghav Neela (Graz University of Technology), Martin Heckel (Hof University of Applied Sciences), Hannes Weissteiner (Graz University of Technology), Abdullah Giray Yağlıkçı (ETH Zürich), Florian Adamsky (Hof University of Applied Sciences), Daniel Gruss (Graz University of Technology)

Read More

LighTellite: Reinforcement Learning-Based Framework for Energy Efficient Onboard Satellite...

Aviel Ben Siman Tov (Ben Gurion University of the Negev), Edita Grolman (Ben Gurion University of the Negev), Yuval Elovici (Ben Gurion University of the Negev), Asaf Shabtai (Ben Gurion University of the Negev)

Read More