Friedemann Lipphardt (MPI-INF), Moonis Ali (MPI-INF), Martin Banzer (MPI-INF), Anja Feldmann (MPI-INF), Devashish Gosain (IIT Bombay)

Large language models (LLMs) are widely used for information access, yet their content moderation behavior varies sharply across geographic and linguistic contexts. This paper presents a first comprehensive analysis of content moderation patterns detected in over 700,000 replies from 15 leading LLMs evaluated from 12 locations using 1,118 sensitive queries spanning five categories in 13 languages.

We find substantial geographic variation, with moderation rates showing relative differences up to 60% across locations—for instance, soft moderation (e.g., evasive replies) appears in 14.3% of German contexts versus 24.9% in Zulu contexts. Category-wise, misc. (generally unsafe), hate speech, and sexual content are more heavily moderated than political or religious content, with political content showing the most geographic variability. We also observe discrepancies between online and offline model versions, such as DeepSeek exhibiting 15.2% higher relative soft moderation rates when deployed locally than via API. The response length (and time) analysis reveals that moderated responses are, on average, about 50% shorter than the unmoderated ones.

These findings have important implications for AI fairness and digital equity, as users in different locations receive inconsistent access to information. We provide the first systematic evidence of geographic cross-language bias in LLM content moderation and showcase how model selection vastly impacts user experience.

View More Papers

CoordMail: Exploiting SMTP Timeout and Command Interaction to Coordinate...

Ruixuan Li (Tsinghua University and Beijing National Research Center for Information Science and Technology), Chaoyi Lu (Zhongguancun Laboratory), Baojun Liu (Tsinghua University and Beijing National Research Center for Information Science and Technology), Yanzhong Lin (Coremail Technology Co. Ltd), Qingfeng Pan (Coremail Technology Co. Ltd), Jun Shao (Zhejiang Gongshang University and Zhejiang Key Laboratory of Big…

Read More

NeuroStrike: Neuron-Level Attacks on Aligned LLMs

Lichao Wu (Technical University of Darmstadt), Sasha Behrouzi (Technical University of Darmstadt), Mohamadreza Rostami (Technical University of Darmstadt), Maximilian Thang (Technical University of Darmstadt), Stjepan Picek (University of Zagreb & Radboud University), Ahmad-Reza Sadeghi (Technical University of Darmstadt)

Read More

A Deep Dive into Function Inlining and its Security...

Omar Abusabha (Sungkyunkwan University, South Korea), Jiyong Uhm (Sungkyunkwan University, South Korea), Tamer Abuhmed (Sungkyunkwan University, South Korea), Hyungjoon Koo (Sungkyunkwan University, South Korea)

Read More