SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents

Siyuan Liang; Tianmeng Fang; Zhe Liu; Aishan Liu; Yan Xiao; Jinyuan He; Ee-Chien Chang; Xiaochun Cao

arXiv:2507.00841·cs.AI·July 2, 2025

SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents

Siyuan Liang, Tianmeng Fang, Zhe Liu, Aishan Liu, Yan Xiao, Jinyuan He, Ee-Chien Chang, Xiaochun Cao

PDF

Open Access

TL;DR

This paper introduces SafeMobile, a system that detects jailbreak attempts in multimodal mobile agents by analyzing behavioral sequences and using large language models, enhancing security in complex interaction scenarios.

Contribution

The work presents a novel risk discrimination mechanism and an automated assessment scheme for mobile multimodal agents, addressing limitations of existing security measures.

Findings

01

Improved detection of risky behaviors in high-risk tasks.

02

Reduction in the probability of agents being jailbroken.

03

Enhanced recognition of security threats through behavioral sequence analysis.

Abstract

With the wide application of multimodal foundation models in intelligent agent systems, scenarios such as mobile device control, intelligent assistant interaction, and multimodal task execution are gradually relying on such large model-driven agents. However, the related systems are also increasingly exposed to potential jailbreak risks. Attackers may induce the agents to bypass the original behavioral constraints through specific inputs, and then trigger certain risky and sensitive operations, such as modifying settings, executing unauthorized commands, or impersonating user identities, which brings new challenges to system security. Existing security measures for intelligent agents still have limitations when facing complex interactions, especially in detecting potentially risky behaviors across multiple rounds of conversations or sequences of tasks. In addition, an efficient and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAccess Control and Trust · Social Robot Interaction and HRI · Advanced Malware Detection Techniques