SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents
Siyuan Liang, Tianmeng Fang, Zhe Liu, Aishan Liu, Yan Xiao, Jinyuan He, Ee-Chien Chang, Xiaochun Cao

TL;DR
This paper introduces SafeMobile, a system that detects jailbreak attempts in multimodal mobile agents by analyzing behavioral sequences and using large language models, enhancing security in complex interaction scenarios.
Contribution
The work presents a novel risk discrimination mechanism and an automated assessment scheme for mobile multimodal agents, addressing limitations of existing security measures.
Findings
Improved detection of risky behaviors in high-risk tasks.
Reduction in the probability of agents being jailbroken.
Enhanced recognition of security threats through behavioral sequence analysis.
Abstract
With the wide application of multimodal foundation models in intelligent agent systems, scenarios such as mobile device control, intelligent assistant interaction, and multimodal task execution are gradually relying on such large model-driven agents. However, the related systems are also increasingly exposed to potential jailbreak risks. Attackers may induce the agents to bypass the original behavioral constraints through specific inputs, and then trigger certain risky and sensitive operations, such as modifying settings, executing unauthorized commands, or impersonating user identities, which brings new challenges to system security. Existing security measures for intelligent agents still have limitations when facing complex interactions, especially in detecting potentially risky behaviors across multiple rounds of conversations or sequences of tasks. In addition, an efficient and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAccess Control and Trust · Social Robot Interaction and HRI · Advanced Malware Detection Techniques
