Kavita Kumari (Technical University of Darmstadt), Maryam Abbasihafshejani (University of Texas at San Antonio), Alessandro Pegoraro (Technical University of Darmstadt), Phillip Rieger (Technical University of Darmstadt), Kamyar Arshi (Technical University of Darmstadt), Murtuza Jadliwala (University of Texas at San Antonio), Ahmad-Reza Sadeghi (Technical University of Darmstadt)

Recent advancements in synthetic speech generation, including text-to-speech (TTS) and voice conversion (VC) models, allow the generation of convincing synthetic voices, often referred to as audio deepfakes. These deepfakes pose a growing threat as adversaries can use them to impersonate individuals, particularly prominent figures, on social media or bypass voice authentication systems, thus having a broad societal impact. The inability of state-of-the-art verification systems to detect voice deepfakes effectively is alarming.
We propose a novel audio deepfake detection method, VoiceRadar, that augments machine learning with physical models to approximate frequency dynamics and oscillations in audio samples. This significantly enhances detection capabilities. VoiceRadar leverages two main physical models: (i) the Doppler effect to understand frequency changes in audio samples and (ii) drumhead vibrations to decompose complex audio signals into component frequencies. VoiceRadar identifies subtle variations, or micro-frequencies, in the audio signals by applying these models. These micro-frequencies are aggregated to compute the observed frequency, capturing the unique signature of the audio. This observed frequency is integrated into the machine learning algorithm’s loss function, enabling the algorithm to recognize distinct patterns that differentiate human-produced audio from AI-generated audio.
We constructed a new diverse dataset to comprehensively evaluate VoiceRadar, featuring samples from leading TTS and VC models. Our results demonstrate that VoiceRadar outperforms existing methods in accurately identifying AI-generated audio samples, showcasing its potential as a robust tool for audio deepfake detection.

View More Papers

Recurrent Private Set Intersection for Unbalanced Databases with Cuckoo...

Eduardo Chielle (New York University Abu Dhabi), Michail Maniatakos (New York University Abu Dhabi)

Read More

Rondo: Scalable and Reconfiguration-Friendly Randomness Beacon

Xuanji Meng (Tsinghua University), Xiao Sui (Shandong University), Zhaoxin Yang (Tsinghua University), Kang Rong (Blockchain Platform Division,Ant Group), Wenbo Xu (Blockchain Platform Division,Ant Group), Shenglong Chen (Blockchain Platform Division,Ant Group), Ying Yan (Blockchain Platform Division,Ant Group), Sisi Duan (Tsinghua University)

Read More

Eclipse Attacks on Monero's Peer-to-Peer Network

Ruisheng Shi (Beijing University of Posts and Telecommunications), Zhiyuan Peng (Beijing University of Posts and Telecommunications), Lina Lan (Beijing University of Posts and Telecommunications), Yulian Ge (Beijing University of Posts and Telecommunications), Peng Liu (Penn State University), Qin Wang (CSIRO Data61), Juan Wang (Wuhan University)

Read More

ReThink: Reveal the Threat of Electromagnetic Interference on Power...

Fengchen Yang (Zhejiang University; ZJU QI-ANXIN IoT Security Joint Labratory), Zihao Dan (Zhejiang University; ZJU QI-ANXIN IoT Security Joint Labratory), Kaikai Pan (Zhejiang University; ZJU QI-ANXIN IoT Security Joint Labratory), Chen Yan (Zhejiang University; ZJU QI-ANXIN IoT Security Joint Labratory), Xiaoyu Ji (Zhejiang University; ZJU QI-ANXIN IoT Security Joint Labratory), Wenyuan Xu (Zhejiang University; ZJU…

Read More