AgentLens: Adaptive Visual Modalities for Human-Agent Interaction in Mobile GUI Agents
Jeonghyeon Kim, Byeongjun Joung, Junwon Lee, Joohyung Lee, Taehoon Min, Sunjae Lee

TL;DR
AgentLens is a mobile GUI agent that adaptively switches between visual modalities to optimize human-agent interaction, balancing transparency and multitasking.
Contribution
It introduces a hybrid visual modality system with adaptive communication actions and Virtual Display for background execution in mobile GUI agents.
Findings
85.7% of participants preferred AgentLens.
Achieved high usability score of 1.94 PSSUQ.
High adoption intent score of 6.43 out of 7.
Abstract
Mobile GUI agents can automate smartphone tasks by interacting directly with app interfaces, but how they should communicate with users during execution remains underexplored. Existing systems rely on two extremes: foreground execution, which maximizes transparency but prevents multitasking, and background execution, which supports multitasking but provides little visual awareness. Through iterative formative studies, we found that users prefer a hybrid model with just-in-time visual interaction, but the most effective visualization modality depends on the task. Motivated by this, we present AgentLens, a mobile GUI agent that adaptively uses three visual modalities during human-agent interaction: Full UI, Partial UI, and GenUI. AgentLens extends a standard mobile agent with adaptive communication actions and uses Virtual Display to enable background execution with selective visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
