Seeing is Believing: Vision-driven Non-crash Functional Bug Detection for Mobile Apps
Zhe Liu, Cheng Li, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu,, Yawen Wang, Jun Hu, Qing Wang

TL;DR
This paper introduces Trident, a vision-driven multi-agent system leveraging multimodal large language models to detect non-crash functional bugs in mobile app GUIs, significantly improving detection accuracy over existing methods.
Contribution
It presents a novel multi-agent approach utilizing multimodal large language models for effective non-crash bug detection in mobile app GUIs, addressing visual-text alignment and logical inference challenges.
Findings
Achieves 14%-112% higher recall and 108%-147% higher precision than baselines.
Detects 43 new bugs on Google Play, with 31 fixed.
Proves effectiveness of each module through ablation studies.
Abstract
Mobile app GUI (Graphical User Interface) pages now contain rich visual information, with the visual semantics of each page helping users understand the application logic. However, these complex visual and functional logic present new challenges to software testing. Existing automated GUI testing methods, constrained by the lack of reliable testing oracles, are limited to detecting crash bugs with obvious abnormal signals. Consequently, many non-crash functional bugs, ranging from unexpected behaviors to logical errors, often evade detection by current techniques. While these non-crash functional bugs can exhibit visual cues that serve as potential testing oracles, they often entail a sequence of screenshots, and detecting them necessitates an understanding of the operational logic among GUI page transitions, which is challenging traditional techniques. Considering the remarkable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Robotics and Automated Systems · Multimodal Machine Learning Applications
