Beyond Chat and Clicks: GUI Agents for In-Situ Assistance via Live Interface Transformation
Pan Hao, Rishi Selvakumaran, Jacob Sun, Qianwen Wang

TL;DR
This paper introduces DOMSteer, a browser extension that provides in-situ GUI assistance by live-manipulating web interfaces, improving user understanding and navigation without modifying underlying application code.
Contribution
It presents a novel design space and computational pipeline for DOM-mediated in-situ assistance, enabling GUI agents to dynamically reconfigure live web interfaces.
Findings
DOMSteer effectively delivers contextual assistance through DOM manipulations.
Quantitative evaluations show reliable and efficient support in complex interfaces.
User study confirms the usability and effectiveness of DOMSteer.
Abstract
Complex visual interfaces are powerful yet have a steep learning curve, as users must navigate feature-rich visual interfaces while reasoning about domain-specific operations. Existing approaches either deliver assistance through a separate chat-based interaction, or require substantial application-specific engineering to build support natively into each interface. To address the gaps, we propose in-situ assistance: a mode of support delivered directly within any live web interface through lightweight, browser-level interventions on the Document Object Model (DOM), without rebuilding the application or modifying its underlying logic. We contribute a design space and a computational pipeline for DOM-mediated in-situ assistance, characterizing how GUI agents can insert, mutate, or recompose web elements to make the interface easier for users to understand, use, and navigate. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
