Yansong Gao (The University of Western Australia), Huaibing Peng (Nanjing University of Science and Technology), Hua Ma (CSIRO's Data61), Zhi Zhang (The University of Western Australia), Shuo Wang (Shanghai Jiao Tong University), Rayne Holland (CSIRO's Data61), Anmin Fu (Nanjing University of Science and Technology), Minhui Xue (CSIRO's Data61), Derek Abbott (The University of Adelaide, Australia)

In the Data as a Service (DaaS) model, data curators, such as commercial providers like Amazon Mechanical Turk, Appen, and TELUS International, aggregate quality data from numerous contributors and monetize it for deep learning (DL) model providers. However, malicious contributors can poison this data, embedding backdoors in the trained DL models. Existing methods for detecting poisoned samples face significant limitations: they often rely on reserved clean data; they are sensitive to the poisoning rate, trigger type, and backdoor type; and they are specific to classification tasks. These limitations hinder their practical adoption by data curators.

This work, for the first time, investigates the textit{training trajectory} of poisoned samples in the textit{spectrum domain}, revealing distinctions from benign samples that are not apparent in the original non-spectrum domain. Building on this novel perspective, we propose TellTale to detect and sanitize poisoned samples as a one-time effort, addressing textit{all} of the aforementioned limitations of prior work. Through extensive experiments, TellTale demonstrates the ability to defeat both universal and challenging partial backdoor types without relying on any reserved clean data. TellTale is also validated to be agnostic to various trigger types, including the advanced clean-label trigger attack, Narcissus (CCS'2023). Moreover, TellTale proves effective across diverse data modalities (e.g., image, audio and text) and non-classification tasks (e.g., regression)---making it the only known training phase poisoned sample detection method applicable to non-classification tasks. In all our evaluations, TellTale achieves a detection accuracy (i.e., accurately identifying poisoned samples) of at least 95.52% and a false positive rate (i.e., falsely recognizing benign samples as poisoned ones) no higher than 0.61%. Comparisons with state-of-the-art methods, ASSET (Usenix'2023) and CT (Usenix'2023), further affirm TellTale's superior performance. More specifically, ASSET fails to handle partial backdoor types and incurs an unbearable false positive rate with clean/benign datasets common in practice, while CT fails against the Narcissus trigger. In contrast, TellTale proves highly effective across testing scenarios where prior work fails. The source code is released at https://github.com/MPaloze/Telltale.

View More Papers

The Power of Words: A Comprehensive Analysis of Rationales...

Yusra Elbitar (CISPA Helmholtz Center for Information Security), Alexander Hart (CISPA Helmholtz Center for Information Security), Sven Bugiel (CISPA Helmholtz Center for Information Security)

Read More

KernelSnitch: Side Channel-Attacks on Kernel Data Structures

Lukas Maar (Graz University of Technology), Jonas Juffinger (Graz University of Technology), Thomas Steinbauer (Graz University of Technology), Daniel Gruss (Graz University of Technology), Stefan Mangard (Graz University of Technology)

Read More

An Empirical Study on Fingerprint API Misuse with Lifecycle...

Xin Zhang (Fudan University), Xiaohan Zhang (Fudan University), Zhichen Liu (Fudan University), Bo Zhao (Fudan University), Zhemin Yang (Fudan University), Min Yang (Fudan University)

Read More