Rui Zhu (Indiana University Bloominton), Di Tang (Indiana University Bloomington), Siyuan Tang (Indiana University Bloomington), Zihao Wang (Indiana University Bloomington), Guanhong Tao (Purdue University), Shiqing Ma (University of Massachusetts Amherst), XiaoFeng Wang (Indiana University Bloomington), Haixu Tang (Indiana University, Bloomington)

Most existing methods to detect backdoored machine learning (ML) models take one of the two approaches: trigger inversion (aka. reverse engineer) and weight analysis (aka. model diagnosis). In particular, the gradient-based trigger inversion is considered to be among the most effective backdoor detection techniques, as evidenced by the TrojAI competition, Trojan Detection Challenge and backdoorBench. However, little has been done to understand why this technique works so well and, more importantly, whether it raises the bar to the backdoor attack. In this paper, we report the first attempt to answer this question by analyzing the change rate of the backdoored model's output around its trigger-carrying inputs. Our study shows that existing attacks tend to inject the backdoor characterized by a low change rate around trigger-carrying inputs, which are easy to capture by gradient-based trigger inversion. In the meantime, we found that the low change rate is not necessary for a backdoor attack to succeed: we design a new attack enhancement method called Gradient Shaping (GRASP), which follows the opposite direction of adversarial training to reduce the change rate of a backdoored model with regard to the trigger, without undermining its backdoor effect. Also, we provide a theoretic analysis to explain the effectiveness of this new technique and the fundamental weakness of gradient-based trigger inversion. Finally, we perform both theoretical and experimental analysis, showing that the GRASP enhancement does not reduce the effectiveness of the stealthy attacks designed to evade the backdoor detection methods based on weight analysis, as well as other backdoor mitigation methods without using detection.

View More Papers

GNNIC: Finding Long-Lost Sibling Functions with Abstract Similarity

Qiushi Wu (University of Minnesota), Zhongshu Gu (IBM Research), Hani Jamjoom (IBM Research), Kangjie Lu (University of Minnesota)

Read More

Private Aggregate Queries to Untrusted Databases

Syed Mahbub Hafiz (University of California, Davis), Chitrabhanu Gupta (University of California, Davis), Warren Wnuck (University of California, Davis), Brijesh Vora (University of California, Davis), Chen-Nee Chuah (University of California, Davis)

Read More

You Can Use But Cannot Recognize: Preserving Visual Privacy...

Qiushi Li (Tsinghua University), Yan Zhang (Tsinghua University), Ju Ren (Tsinghua University), Qi Li (Tsinghua University), Yaoxue Zhang (Tsinghua University)

Read More

DorPatch: Distributed and Occlusion-Robust Adversarial Patch to Evade Certifiable...

Chaoxiang He (Huazhong University of Science and Technology), Xiaojing Ma (Huazhong University of Science and Technology), Bin B. Zhu (Microsoft Research), Yimiao Zeng (Huazhong University of Science and Technology), Hanqing Hu (Huazhong University of Science and Technology), Xiaofan Bai (Huazhong University of Science and Technology), Hai Jin (Huazhong University of Science and Technology), Dongmei Zhang…

Read More