Chang Liu (University of Science and Technology of China), Jie Zhang (Nanyang Technological University), Tianwei Zhang (Nanyang Technological University), Xi Yang (University of Science and Technology of China), Weiming Zhang (University of Science and Technology of China), NengHai Yu (University of Science and Technology of China)

Nowadays, it is common to release audio content to the public, for social sharing or commercial purposes. However, with the rise of voice cloning technology, attackers have the potential to easily impersonate a specific person by utilizing his publicly released audio without any permission. Therefore, it becomes significant to detect any potential misuse of the released audio content and protect its timbre from being impersonated.

To this end, we introduce a novel concept, "Timbre Watermarking", which embeds watermark information into the target individual's speech, eventually defeating the voice cloning attacks. However, there are two challenges: 1) robustness: the attacker can remove the watermark with common speech preprocessing before launching voice cloning attacks; 2) generalization: there are a variety of voice cloning approaches for the attacker to choose, making it hard to build a general defense against all of them.

To address these challenges, we design an end-to-end voice cloning-resistant detection framework. The core idea of our solution is to embed the watermark into the frequency domain, which is inherently robust against common data processing methods. A repeated embedding strategy is adopted to further enhance the robustness. To acquire generalization across different voice cloning attacks, we modulate their shared process and integrate it into our framework as a distortion layer. Experiments demonstrate that the proposed timbre watermarking can defend against different voice cloning attacks, exhibit strong resistance against various adaptive attacks (e.g., reconstruction-based removal attacks, watermark overwriting attacks), and achieve practicality in real-world services such as PaddleSpeech, Voice-Cloning-App, and so-vits-svc. In addition, ablation studies are also conducted to verify the effectiveness of our design. Some audio samples are available at https://timbrewatermarking.github.io/samples.

View More Papers

AAKA: An Anti-Tracking Cellular Authentication Scheme Leveraging Anonymous Credentials

Hexuan Yu (Virginia Polytechnic Institute and State University), Changlai Du (Virginia Polytechnic Institute and State University), Yang Xiao (University of Kentucky), Angelos Keromytis (Georgia Institute of Technology), Chonggang Wang (InterDigital), Robert Gazda (InterDigital), Y. Thomas Hou (Virginia Polytechnic Institute and State University), Wenjing Lou (Virginia Polytechnic Institute and State University)

Read More

UntrustIDE: Exploiting Weaknesses in VS Code Extensions

Elizabeth Lin (North Carolina State University), Igibek Koishybayev (North Carolina State University), Trevor Dunlap (North Carolina State University), William Enck (North Carolina State University), Alexandros Kapravelos (North Carolina State University)

Read More

Low-Quality Training Data Only? A Robust Framework for Detecting...

Yuqi Qing (Tsinghua University), Qilei Yin (Zhongguancun Laboratory), Xinhao Deng (Tsinghua University), Yihao Chen (Tsinghua University), Zhuotao Liu (Tsinghua University), Kun Sun (George Mason University), Ke Xu (Tsinghua University), Jia Zhang (Tsinghua University), Qi Li (Tsinghua University)

Read More

TEE-SHirT: Scalable Leakage-Free Cache Hierarchies for TEEs

Kerem Arikan (Binghamton University), Abraham Farrell (Binghamton University), Williams Zhang Cen (Binghamton University), Jack McMahon (Binghamton University), Barry Williams (Binghamton University), Yu David Liu (Binghamton University), Nael Abu-Ghazaleh (University of California, Riverside), Dmitry Ponomarev (Binghamton University)

Read More