Ahmad ALBarqawi (New Jersey Institute of Technology, Newark, NJ, USA), Mahmoud Nazzal (Old Dominion University, Norfolk, VA, USA), Issa Khalil (Qatar Computing Research Institute (QCRI), HBKU, Doha, Qatar), Abdallah Khreishah (New Jersey Institute of Technology, Newark, NJ, USA), NhatHai Phan (New Jersey Institute of Technology, Newark, NJ, USA)

The rapid rise of deepfake technology, which produces realistic but fraudulent digital content, threatens the authenticity of media. Deepfakes manipulate videos, images, and audio, spread misinformation, blur the line between real and fake, and highlight the need for effective detection approaches. Traditional deepfake detection approaches often struggle with sophisticated, customized deepfakes, especially in terms of generalization and robustness against malicious attacks. This paper introduces ViGText, a novel approach that integrates images with Vision Large Language Model (VLLM) Text explanations within a Graph-based framework to improve deepfake detection. The novelty of ViGText lies in its integration of detailed explanations with visual data, as it provides a more context-aware analysis than captions, which often lack specificity and fail to reveal subtle inconsistencies. ViGText systematically divides images into patches, constructs image and text graphs, and integrates them for analysis using Graph Neural Networks (GNNs) to identify deepfakes. Through the use of multi-level feature extraction across spatial and frequency domains, ViGText captures details that enhance its robustness and accuracy to detect sophisticated deepfakes. Extensive experiments demonstrate that ViGText significantly enhances generalization and achieves a notable performance boost when it detects user-customized deepfakes. Specifically, average F1 scores rise from 72.45% to 98.32% under generalization evaluation, and reflects the model’s superior ability to generalize to unseen, fine-tuned variations of stable diffusion models. As for robustness, ViGText achieves an increase of 11.1% in recall compared to other deepfake detection approaches against state-of-the-art foundation model-based adversarial attacks. ViGText limits classification performance degradation to less than 4% when it faces targeted attacks that exploit its graph-based architecture and marginally increases the execution cost. ViGText combines granular visual analysis with textual interpretation, establishes a new benchmark for deepfake detection, and provides a more reliable framework to preserve media authenticity and information integrity.

View More Papers

SysArmor: The Practice of Integrating Provenance Analysis into Endpoint...

Shaofei Li (Peking University), Jiandong Jin (Peking University), Hanlin Jiang (Peking University), Yi Huang (Peking University), Yifei Bao (Jilin University), Yuhan Meng (Peking University), Fengwei Hong (Peking University), Zheng Huang (Peking University), Peng Jiang (Southeast University), Ding Li (Peking University)

Read More

CoT-DPG: A Co-Training based Dynamic Password Guessing Method

Chenyang Wang (National University of Defense Technology), Fan Shi (National University of Defense Technology), Min Zhang (National University of Defense Technology), Chengxi Xu (National University of Defense Technology), Miao Hu (National University of Defense Technology), Pengfei Xue (National University of Defense Technology), Shasha Guo (National University of Defense Technology), jinghua zheng (National University of Defense…

Read More

Proactive Hardening of LLM Defenses with HASTE

Henry Chen (Palo Alto Networks), Victor Aranda (Palo Alto Networks), Samarth Keshari (Palo Alto Networks), Ryan Heartfield (Palo Alto Networks), Nicole Nichols (Palo Alto Networks)

Read More