Dung Thuy Nguyen (Vanderbilt University), Ngoc N. Tran (Vanderbilt University), Taylor T. Johnson (Vanderbilt University), Kevin Leach (Vanderbilt University)

In recent years, the rise of machine learning (ML) in cybersecurity has brought new challenges, including the increasing threat of backdoor poisoning attacks on ML malware classifiers. These attacks aim to manipulate model behavior when provided with a particular input trigger. For instance, adversaries could inject malicious samples into public malware repositories, contaminating the training data and potentially misclassifying malware by the ML model. Current countermeasures predominantly focus on detecting poisoned samples by leveraging disagreements within the outputs of a diverse set of ensemble models on training data points.
However, these methods are not applicable in scenarios involving ML-as-a-Service (MLaaS) or for users who seek to purify a backdoored model post-training. Addressing this scenario, we introduce PBP, a post-training defense for malware classifiers that mitigates various types of backdoor embeddings without assuming any specific backdoor embedding mechanism. Our method exploits the influence of backdoor attacks on the activation distribution of neural networks, independent of the trigger-embedding method.
In the presence of a backdoor attack, the activation distribution of each layer is distorted into a mixture of distributions. By regulating the statistics of the batch normalization layers, we can guide a backdoored model to perform similarly to a clean one. Our method demonstrates substantial advantages over several state-of-the-art methods, as evidenced by experiments on two datasets, two types of backdoor methods, and various attack configurations. Our experiments showcase that PBP can mitigate even the SOTA backdoor attacks for malware classifiers, e.g., Jigsaw Puzzle, which was previously demonstrated to be stealthy against existing backdoor defenses. Notably, your approach requires only a small portion of the training data --- only 1% --- to purify the backdoor and reduce the attack success rate from 100% to almost 0%, a 100-fold improvement over the baseline methods. Our code is available at https://github.com/judydnguyen/pbp-backdoor-purification-official.

View More Papers

On the Robustness of LDP Protocols for Numerical Attributes...

Xiaoguang Li (Xidian University, Purdue University), Zitao Li (Alibaba Group (U.S.) Inc.), Ninghui Li (Purdue University), Wenhai Sun (Purdue University, West Lafayette, USA)

Read More

Blindfold: Confidential Memory Management by Untrusted Operating System

Caihua Li (Yale University), Seung-seob Lee (Yale University), Lin Zhong (Yale University)

Read More

Vision: The Price Should Be Right: Exploring User Perspectives...

Jacob Hopkins (Texas A&M University - Corpus Christi), Carlos Rubio-Medrano (Texas A&M University - Corpus Christi), Cori Faklaris (University of North Carolina at Charlotte)

Read More

Blackbox Fuzzing of Distributed Systems with Multi-Dimensional Inputs and...

Yonghao Zou (Beihang University and Peking University), Jia-Ju Bai (Beihang University), Zu-Ming Jiang (ETH Zurich), Ming Zhao (Arizona State University), Diyu Zhou (Peking University)

Read More