Yiluo Wei (The Hong Kong University of Science and Technology (Guangzhou)), Peixian Zhang (The Hong Kong University of Science and Technology (Guangzhou)), Gareth Tyson (The Hong Kong University of Science and Technology (Guangzhou))

AI character platforms, which allow users to engage in conversations with AI personas, are a rapidly growing application domain. However, their immersive and personalized nature, combined with technical vulnerabilities, raises significant safety concerns. Despite their popularity, a systematic evaluation of their safety has been notably absent. To address this gap, we conduct the first large-scale safety study of AI character platforms, evaluating 16 popular platforms using a benchmark set of 5,000 questions across 16 safety categories. Our findings reveal a critical safety deficit: AI character platforms exhibit an average unsafe response rate of 65.1%, substantially higher than the 17.7% average rate of the baselines. We further discover that safety performance varies significantly across different characters and is strongly correlated with character features such as demographics and personality. Leveraging these insights, we demonstrate that our machine learning model is able identify less safe characters with an F1-score of 0.81. This predictive capability can be beneficial for platforms, enabling improved mechanisms for safer interactions, character search/recommendations, and character creation. Overall, the results and findings offer valuable insights for enhancing platform governance and content moderation for safer AI character platforms.

View More Papers

Achieving Interpretable DL-based Web Attack Detection through Malicious Payload...

Peiyang Li (Tsinghua University & Ant Group), Fukun Mei (Tsinghua University), Ye Wang (Tsinghua University), Zhuotao Liu (Tsinghua University), Ke Xu (Tsinghua University & Zhongguancun Laboratory), Chao Shen (Xi'an Jiaotong University), Qian Wang (Wuhan University), Qi Li (Tsinghua University & Zhongguancun Laboratory)

Read More

Light into Darkness: Demystifying Profit Strategies Throughout the MEV...

Feng Luo (The Hong Kong Polytechnic University), Zihao Li (The Hong Kong Polytechnic University), Wenxuan Luo (University of Electronic Science and Technology of China), Zheyuan He (University of Electronic Science and Technology of China), Xiapu Luo (The Hong Kong Polytechnic University), Zuchao Ma (The Hong Kong Polytechnic University), Shuwei Song (University of Electronic Science and…

Read More

DualStrike: Accurate, Real-time Eavesdropping and Injection of Keystrokes on...

Xiaomeng Chen (Shanghai Jiao Tong University), Jike Wang (Shanghai Jiao Tong University), Zhenyu Chen (Shanghai Jiao Tong University), Qi Alfred Chen (University of California, Irvine), Xinbing Wang (Shanghai Jiao Tong University), Dongyao Chen (Shanghai Jiao Tong University)

Read More