Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography
Songze Li, Jiameng Cheng, Yiming Li, Xiaojun Jia, Dacheng Tao

TL;DR
Odysseus introduces a novel dual steganography method to covertly embed malicious content in images, effectively bypassing safety filters in commercial multimodal LLM systems and exposing security vulnerabilities.
Contribution
This paper presents Odysseus, a new jailbreak approach using dual steganography to evade safety filters in multimodal LLMs, revealing a critical security blind spot.
Findings
Achieves up to 99% success rate in bypassing safety filters
Reveals limitations of current defenses relying on explicit visibility of malicious content
Demonstrates effectiveness across multiple real-world MLLM systems
Abstract
By integrating language understanding with perceptual modalities such as images, multimodal large language models (MLLMs) constitute a critical substrate for modern AI systems, particularly intelligent agents operating in open and interactive environments. However, their increasing accessibility also raises heightened risks of misuse, such as generating harmful or unsafe content. To mitigate these risks, alignment techniques are commonly applied to align model behavior with human values. Despite these efforts, recent studies have shown that jailbreak attacks can circumvent alignment and elicit unsafe outputs. Currently, most existing jailbreak methods are tailored for open-source models and exhibit limited effectiveness against commercial MLLM-integrated systems, which often employ additional filters. These filters can detect and prevent malicious input and output content, significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Authorship Attribution and Profiling · Topic Modeling
