LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents

Zihe Yan; Jiaping Gui; Zhuosheng Zhang; Gongshen Liu

arXiv:2507.10610·cs.CR·April 8, 2026

LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents

Zihe Yan, Jiaping Gui, Zhuosheng Zhang, Gongshen Liu

PDF

1 Repo

TL;DR

LaSM is a layer-wise attention scaling method that enhances GUI agent robustness against pop-up attacks without retraining, by aligning attention with task-relevant regions.

Contribution

It uncovers layer-wise attention divergence patterns and introduces LaSM, a novel attention amplification technique that improves defense success without additional training.

Findings

01

LaSM significantly increases attack defense success rate.

02

It maintains the model's general capabilities with negligible impact.

03

Attention misalignment is identified as a core vulnerability.

Abstract

Graphical user interface (GUI) agents built on multimodal large language models (MLLMs) have recently demonstrated strong decision-making abilities in screen-based interaction tasks. However, they remain highly vulnerable to pop-up-based environmental injection attacks, where malicious visual elements divert model attention and lead to unsafe or incorrect actions. Existing defense methods either require costly retraining or perform poorly under inductive interference. In this work, we systematically study how such attacks alter the attention behavior of GUI agents and uncover a layer-wise attention divergence pattern between correct and incorrect outputs. Based on this insight, we propose \textbf{LaSM}, a \textit{Layer-wise Scaling Mechanism} that selectively amplifies attention and MLP modules in critical layers. LaSM improves the alignment between model saliency and task-relevant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YANGTUOMAO/LaSM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.