Eden Luzon (Ben-Gurion University, Institute of Software Systems and Security), Guy Amit (Ben-Gurion University, Institute of Software Systems and Security), Roy Weiss (Ben-Gurion University, Institute of Software Systems and Security), Torsten Krauß (University of Würzburg), Alexandra Dmitrienko (University of Würzburg), Yisroel Mirsky (Ben-Gurion University, Institute of Software Systems and Security)

Neural networks are often trained on proprietary datasets, making them attractive attack targets. We present a novel dataset extraction method leveraging an innovative training-time backdoor attack, allowing a malicious federated learning (FL) server to systematically and deterministically extract complete client training samples through a simple indexing process. Unlike prior techniques, our approach guarantees exact data recovery rather than probabilistic reconstructions or hallucinations, provides precise control over which samples are memorized and how many, and shows high capacity and robustness. Infected models output data samples when they receive a pattern-based index trigger, enabling systematic extraction of meaningful patches from each client’s local data without disrupting global model utility. To address small model output sizes, we extract patches and then recombined them.

The attack requires only a minor modification to the training code that can easily evade detection during client-side verification. Hence, this vulnerability represents a realistic FL supply-chain threat, where a malicious server can distribute modified training code to clients and later recover private data from their updates. Evaluations across classifiers, segmentation models, and large language models demonstrate that thousands of sensitive training samples can be recovered from client models with minimal impact on task performance, and a client's entire dataset can be stolen after multiple FL rounds. For instance, a medical segmentation dataset can be extracted with only a 3% utility drop. These findings expose a critical privacy vulnerability in FL systems, emphasizing the need for stronger integrity and transparency in distributed training pipelines.

View More Papers

CAT: Can Trust be Predicted with Context-Awareness in Dynamic...

Jie Wang (State Key Laboratory of Integrated Services Networks, School of Cyber Engineering, Xidian University), Zheng Yan (State Key Laboratory of Integrated Services Networks, School of Cyber Engineering, Xidian University and Hangzhou Institute of Technology, Xidian University), Jiahe Lan (State Key Laboratory of Integrated Services Networks, School of Cyber Engineering, Xidian University), Xuyan Li (Hangzhou…

Read More

SECV: Securing Connected Vehicles with Hardware Trust Anchors

Martin Kayondo (Seoul National University), Junseung You (Seoul National University), Eunmin Kim (Seoul National University), Jiwon Seo (Dankook University), Yunheung Paek (Seoul National University)

Read More

Poster: Challenges in Applying COTS Secure, Resilient Boot and...

Gabriel Torres (MIT Lincoln Laboratory, Secure Resilient Systems & Technology, Lexington, MA), Raymond Govotski (MIT Lincoln Laboratory, Secure Resilient Systems & Technology, Lexington, MA), Samuel Jero (MIT Lincoln Laboratory, Secure Resilient Systems & Technology, Lexington, MA), Gruia-Catalin Roman (University of New Mexico, Department of Computer Science), Joseph “Dan” Trujillo (Air Force Research Laboratory, Space Vehicles Directorate), Richard Skowyra (MIT Lincoln Laboratory, Secure Resilient Systems…

Read More