Enhancing Virtual Assistant Intelligence: Precise Area Targeting for Instance-level User Intents beyond Metadata
Mengyu Chen, Zhenchang Xing, Jieshan Chen, Chunyang Chen, Qinghua, Lu

TL;DR
This paper presents a novel deep learning approach enabling virtual assistants to understand and target specific areas on application screens based solely on pixel data, without needing app metadata or modifications.
Contribution
We introduce a cross-modal deep learning pipeline that predicts operational areas for user intents directly from screen pixels, advancing instance-level intent understanding in virtual assistants.
Findings
Achieved 64.43% accuracy on the testing dataset
Demonstrated effectiveness of pixel-based intent targeting
Enabled understanding without app metadata or modifications
Abstract
Virtual assistants have been widely used by mobile phone users in recent years. Although their capabilities of processing user intents have been developed rapidly, virtual assistants in most platforms are only capable of handling pre-defined high-level tasks supported by extra manual efforts of developers. However, instance-level user intents containing more detailed objectives with complex practical situations, are yet rarely studied so far. In this paper, we explore virtual assistants capable of processing instance-level user intents based on pixels of application screens, without the requirements of extra extensions on the application side. We propose a novel cross-modal deep learning pipeline, which understands the input vocal or textual instance-level user intents, predicts the targeting operational area, and detects the absolute button area on screens without any metadata of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Multimodal Machine Learning Applications · Sentiment Analysis and Opinion Mining
