Songze Li (Southeast University), Jiameng Cheng (Southeast University), Yiming Li (Nanyang Technological University), Xiaojun Jia (Nanyang Technological University), Dacheng Tao (Nanyang Technological University)

Multimodal large language models (MLLMs), which combine text with other modalities, such as images, have demonstrated powerful capabilities and are increasingly adopted in real-world commercial systems. However, their growing accessibility also raises concerns about misuse, such as generating harmful content. To mitigate these risks, alignment techniques are commonly applied to align model behavior with human values. Despite these efforts, recent studies have shown that jailbreak attacks can circumvent alignment and elicit unsafe outputs. Currently, most existing jailbreak methods are tailored for open-source models and exhibit limited effectiveness against commercial MLLM-integrated systems, which often employ additional filters. These filters can detect and prevent malicious input and output content, significantly reducing jailbreak threats.

In this paper, we reveal that the success of these safety filters heavily relies on a critical assumption that malicious content must be explicitly visible in either the input or the output. This assumption, while often valid for traditional LLM-integrated systems, breaks down in MLLM-integrated systems, where attackers can leverage multiple modalities to conceal adversarial intent, leading to a false sense of security in existing MLLM-integrated systems. To challenge this assumption, we propose texttt{Odysseus}, a novel jailbreak paradigm that introduces dual steganography to covertly embed malicious queries and responses into benign-looking images. Our method proceeds through four stages: textbf{(1)} malicious query encoding, textbf{(2)} steganography embedding, textbf{(3)} model interaction, and textbf{(4)} response extraction. We first encode the adversary-specified malicious prompt into binary matrices and embed them into images using a steganography model. The modified image will be fed into the victim MLLM-integrated system. We encourage the victim MLLM-integrated system to implant the generated illegitimate content into a carrier image (via steganography), which will be used for attackers to decode the hidden response locally. Extensive experiments on benchmark datasets demonstrate that our texttt{Odysseus} successfully jailbreaks several pioneering and realistic MLLM-integrated systems, including GPT-4o, Gemini-2.0-pro, Gemini-2.0-flash, and Grok-3, achieving up to 99% attack success rate. It exposes a fundamental blind spot in existing defenses, and calls for rethinking cross-modal security in MLLM-integrated systems.

View More Papers

FirmCross: Detecting Taint-style Vulnerabilities in Modern C-Lua Hybrid Web...

Runhao Liu (National University of Defense Technology), Jiarun Dai (Fudan University), Haoyu Xiao (Fudan University), Yuan Zhang (Fudan University), Yeqi Mou (National University of Defense Technology), Lukai Xu (National University of Defense Technology), Bo Yu (National University of Defense Technology), Baosheng Wang (National University of Defense Technology), Min Yang (Fudan University)

Read More

LatticeBox: A Hardware-Software Co-Designed Framework for Scalable and Low-Latency...

ZhanPeng Liu (Peking University), Chenyang Li (Peking University), Wende Tan (Imperial College London), Yuan Li (Zhongguancun Laboratory), Xinhui Han (Peking University), Xi Cao (Science City (Guangzhou) Digital Technology Group Co., Ltd.), Yong Xie (Qinghai University), Chao Zhang (Tsinghua University)

Read More

The Dark Side of Flexibility: Detecting Risky Permission Chaining...

Xunqi Liu (State Key Laboratory of Integrated Services Networks, School of Cyber Engineering, Xidian University), Nanzi Yang (University of Minnesota), Chang Li (State Key Laboratory of Integrated Services Networks, School of Cyber Engineering, Xidian University), Jinku Li (State Key Laboratory of Integrated Services Networks, School of Cyber Engineering, Xidian University), Jianfeng Ma (State Key Laboratory…

Read More