Wei Zhao (Singapore Management University), Zhe Li (Singapore Management University), Yige Li (Singapore Management University), Jun Sun (Singapore Management University)

Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities in cross-modal understanding, but remain vulnerable to adversarial attacks through visual inputs despite robust textual safety mechanisms. These vulnerabilities arise from two core weaknesses: the continuous nature of visual representations, which allows for gradient-based attacks, and the inadequate transfer of text-based safety mechanisms to visual content. We introduce Q-MLLM, a novel architecture that integrates two-level vector quantization to create a discrete bottleneck against adversarial attacks while preserving multimodal reasoning capabilities. By discretizing visual representations at both pixel-patch and semantic levels, Q-MLLM blocks attack pathways and bridges the cross-modal safety alignment gap. Our two-stage training methodology ensures robust learning while maintaining model utility. Experiments demonstrate that Q-MLLM achieves significantly better defense success rate against both jailbreak attacks and toxic image attacks than existing approaches. Notably, Q-MLLM achieves perfect defense success rate (100%) against jailbreak attacks except in one arguable case, while maintaining competitive performance on multiple utility benchmarks with minimal inference overhead. This work establishes vector quantization as an effective defense mechanism for secure multimodal AI systems without requiring expensive safety-specific fine-tuning or detection overhead.

View More Papers

Scalable Off-Chain Auctions

Mohsen Minaei (Visa Research), Ranjit Kumaresan (Visa Research), Andrew Beams (Visa Research), Pedro Moreno-Sanchez (IMDEA Software Institute, MPI-SP), Yibin Yang (Georgia Institute of Technology), Srinivasan Raghuraman (Visa Research and MIT), Panagiotis Chatzigiannis (Visa Research), Mahdi Zamani (Visa Research), Duc V. Le (Visa Research)

Read More

FidelityGPT: Correcting Decompilation Distortions with Retrieval Augmented Generation

Zhiping Zhou (Tianjin University), Xiaohong Li (Tianjin University), Ruitao Feng (Southern Cross University), Yao Zhang (Tianjin University), Yuekang Li (University of New South Wales), Wenbu Feng (Tianjin University), Yunqian Wang (Tianjin University), Yuqing Li (Tianjin University)

Read More

Pallas and Aegis: Rollback Resilience in TEE-Aided Blockchain Consensus

Jérémie Decouchant (Delft University of Technology), David Kozhaya (ABB Corporate Research), Vincent Rahli (University of Birmingham), Jiangshan Yu (The University of Sydney)

Read More