Zihao Liu (Iowa State University), Aobo Chen (Iowa State University), Yan Zhang (Iowa State University), Wensheng Zhang (Iowa State University), Chenglin Miao (Iowa State University)

Speech synthesis technologies, driven by advances in deep learning, have achieved remarkable realism, enabling diverse applications across various domains. However, these technologies can also be exploited to generate fake speech, introducing significant risks. While existing fake speech detection methods have shown effectiveness in controlled settings, they often struggle to generalize to unseen scenarios, including new synthesis models, languages, and recording conditions. Moreover, many existing approaches rely on specific assumptions and lack comprehensive insights into the common artifacts inherent in fake speech. In this paper, we rethink the task of fake speech detection by proposing a new perspective focused on analyzing the spectrogram magnitude. Through extensive analysis, we uncover that synthetic speech consistently exhibits artifacts in the magnitude representation of the spectrogram, such as reduced texture detail and inconsistencies across magnitude ranges. Leveraging these insights, we introduce a novel assumption-free and generalized fake speech detection framework. The framework partitions spectrograms into layered representations based on magnitude and detects artifacts across both spatial and discrete cosine transform (DCT) domains using 2D and 3D representations. This design enables the framework to effectively capture fine-grained artifacts and synthesis inconsistencies inherent in fake speech. Extensive experiments demonstrate that the proposed framework achieves state-of-the-art performance on several widely used public audio deepfake datasets. Furthermore, evaluations in real-world scenarios involving black-box Web voice-cloning APIs highlight the framework's robustness and practical applicability, consistently outperforming baseline methods.

View More Papers

Time and Time Again: Leveraging TCP Timestamps to Improve...

Vik Vanderlinden (DistriNet, KU Leuven), Tom Van Goethem (DistriNet, KU Leuven), Mathy Vanhoef (DistriNet, KU Leuven)

Read More

SYSYPHUZZ: the Pressure of More Coverage

Zezhong Ren (University of Chinese Academy of Sciences; EPFL), Han Zheng (EPFL), Zhiyao Feng (EPFL), Qinying Wang (EPFL), Marcel Busch (EPFL), Yuqing Zhang (University of Chinese Academy of Sciences), Chao Zhang (Tsinghua University), Mathias Payer (EPFL)

Read More

Light into Darkness: Demystifying Profit Strategies Throughout the MEV...

Feng Luo (The Hong Kong Polytechnic University), Zihao Li (The Hong Kong Polytechnic University), Wenxuan Luo (University of Electronic Science and Technology of China), Zheyuan He (University of Electronic Science and Technology of China), Xiapu Luo (The Hong Kong Polytechnic University), Zuchao Ma (The Hong Kong Polytechnic University), Shuwei Song (University of Electronic Science and…

Read More