Hao Luan (Institute of Big Data, Fudan University, Shanghai, China and College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China), Xue Tan (Institute of Big Data, Fudan University, Shanghai, China and College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China), Zhiheng Li (School of Control Science and Engineering, Shandong University, Jinan, China), Jun Dai (Department of Computer Science, Worcester Polytechnic Institute, MA, USA), Xiaoyan Sun (Department of Computer Science, Worcester Polytechnic Institute, MA, USA), Ping Chen (Institute of Big Data, Fudan University, Shanghai, China and Purple Mountain Laboratories, Nanjing, China)

To safeguard the intellectual property of high-value deep neural networks, black-box watermarking has emerged as a critical defense and has gained increasing momentum. These methods embed watermarks into the model’s prediction behavior through strategically crafted trigger samples, enabling verification via API queries. Meanwhile, model extraction attacks threaten proprietary deep learning models by exploiting query access to replicate watermarked models. These attacks also offer insights into the resilience of watermarking schemes and adversarial capabilities. However, previous methods struggle to remove watermark information, inadvertently retaining defensive mechanisms. They also suffer from inefficiency, often requiring thousands of queries to achieve competitive performance.

To address these limitations, we propose a query-efficient model extraction framework named SSLExtraction. SSLExtraction selects queries via a greedy random walk in the feature space, leading to both effective model replication and watermark removal. Specifically, SSLExtraction follows the self-supervised learning paradigm to extract intrinsic data representations, transforming the original pixel-level inputs into watermark-agnostic features. Then, we propose a greedy random walk algorithm in the feature space to construct a well-dispersed query set that effectively covers the feature space while avoiding redundant queries. By selecting queries in the feature space, our method naturally identifies watermark patterns as outliers, enabling simultaneous watermark removal. Additionally, we propose an evaluation metric tailored for the watermarking task that emphasizes the distinction between benign and stolen models. Unlike previous approaches that rely on manually predefined thresholds, our evaluation metric employs hypothesis testing to measure the relative distance from a suspicious model to both a watermarked model and a benign model, identifying which the suspicious model most closely resembles. Experimental results demonstrate that our method significantly reduces query costs compared to baselines while effectively removing watermarks across various datasets and watermarking scenarios.

View More Papers

RTrace: Towards Better Visibility of Shared Library Execution

Huaifeng Zhang (Chalmers University of Technology), Ahmed Ali-Eldin (Chalmers University of Technology)

Read More

RT-Fuzzer: Task Driven Fuzzing of Real Time Operating System...

Abraham Clements, Abel Gomez Rivera (Sandia National Laboratories), Richard Jiayang Liu, Kirill Levchenko (University of Illinois Urbana-Champaign), Rick Kennell (Purdue University), Gabriela Ciocarlie (The Cybersecurity Manufacturing Innovation Institute and Stevens Institute of Technology) 

Read More

Cease at the Ultimate Goodness: Towards Efficient Website Fingerprinting...

Rong Wang (Southeast University), Zhen Ling (Southeast University), Guangchi Liu (Southeast University), Shaofeng Li (Southeast University), Junzhou Luo (Southeast University and Fuyao University of Science and Technology), Xinwen Fu (University of Massachusetts Lowell)

Read More