Yan Pang (University of Virginia), Aiping Xiong (Penn State University), Yang Zhang (CISPA Helmholtz Center for Information Security), Tianhao Wang (University of Virginia)

Video generation models (VGMs) have demonstrated the capability to synthesize high-quality output. It is important to understand their potential to produce unsafe content, such as violent or terrifying videos. In this work, we provide a comprehensive understanding of unsafe video generation.

First, to confirm the possibility that these models could indeed generate unsafe videos, we choose unsafe content generation prompts collected from 4chan and Lexica, and three open-source SOTA VGMs to generate unsafe videos.
After filtering out duplicates and poorly generated content, we created an initial set of $2112$ unsafe videos from an original pool of $5607$ videos. Through clustering and thematic coding analysis of these generated videos, we identify $5$ unsafe video categories: textit{Distorted/Weird}, textit{Terrifying}, textit{Pornographic}, textit{Violent/Bloody}, and textit{Political}. With IRB approval, we then recruit online participants to help label the generated videos. Based on the annotations submitted by $403$ participants, we identified $937$ unsafe videos from the initial video set. With the labeled information and the corresponding prompts, we created the first dataset of unsafe videos generated by VGMs.

We then study possible defense mechanisms to prevent the generation of unsafe videos. Existing defense methods in image generation focus on filtering either input prompt or output results. We propose a new approach called fullsysname (sysname), which works within the model’s internal sampling process. sysname can achieve $0.90$ defense accuracy while reducing time and computing resources by $10times$ when sampling a large number of unsafe prompts. Our experiment includes three open-source SOTA video diffusion models, each achieving accuracy rates of $0.99$, $0.92$, and $0.91$, respectively. Additionally, our method was tested with adversarial prompts and on image-to-video diffusion models, and achieved nearly $1.0$ accuracy on both settings. Our method also shows its interoperability by improving the performance of other defenses when combined with them.

View More Papers

The State of https Adoption on the Web

Christoph Kerschbaumer (Mozilla Corporation), Frederik Braun (Mozilla Corporation), Simon Friedberger (Mozilla Corporation), Malte Jürgens (Mozilla Corporation)

Read More

Revealing the Black Box of Device Search Engine: Scanning...

Mengying Wu (Fudan University), Geng Hong (Fudan University), Jinsong Chen (Fudan University), Qi Liu (Fudan University), Shujun Tang (QI-ANXIN Technology Research Institute; Tsinghua University), Youhao Li (QI-ANXIN Technology Research Institute), Baojun Liu (Tsinghua University), Haixin Duan (Tsinghua University; Quancheng Laboratory), Min Yang (Fudan University)

Read More

BitShield: Defending Against Bit-Flip Attacks on DNN Executables

Yanzuo Chen (The Hong Kong University of Science and Technology), Yuanyuan Yuan (The Hong Kong University of Science and Technology), Zhibo Liu (The Hong Kong University of Science and Technology), Sihang Hu (Huawei Technologies), Tianxiang Li (Huawei Technologies), Shuai Wang (The Hong Kong University of Science and Technology)

Read More

SketchFeature: High-Quality Per-Flow Feature Extractor Towards Security-Aware Data Plane

Sian Kim (Ewha Womans University), Seyed Mohammad Mehdi Mirnajafizadeh (Wayne State University), Bara Kim (Korea University), Rhongho Jang (Wayne State University), DaeHun Nyang (Ewha Womans University)

Read More