UISearch: Graph-Based Embeddings for Multimodal Enterprise UI Screenshots Retrieval
Maroun Ayli, Youssef Bakouny, Tushar Sharma, Nader Jalloul, Hani Seifeddine, Rima Kilany

TL;DR
This paper introduces UISearch, a graph-based embedding method for enterprise UI screenshots that improves retrieval accuracy by capturing structural, visual, and semantic properties, outperforming existing vision-based methods.
Contribution
The paper presents a novel graph-based UI representation and a contrastive graph autoencoder for embeddings, enabling more discriminative and structured UI retrieval in enterprise settings.
Findings
Structural embeddings outperform state-of-the-art vision encoders.
UISearch achieves 0.92 Top-5 accuracy on 20,396 UIs.
Median query latency is 47.5ms, scalable to 20,000+ screens.
Abstract
Enterprise software companies maintain thousands of user interface screens across products and versions, creating critical challenges for design consistency, pattern discovery, and compliance check. Existing approaches rely on visual similarity or text semantics, lacking explicit modeling of structural properties fundamental to user interface (UI) composition. We present a novel graph-based representation that converts UI screenshots into attributed graphs encoding hierarchical relationships and spatial arrangements, potentially generalizable to document layouts, architectural diagrams, and other structured visual domains. A contrastive graph autoencoder learns embeddings preserving multi-level similarity across visual, structural, and semantic properties. The comprehensive analysis demonstrates that our structural embeddings achieve better discriminative power than state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Multimodal Machine Learning Applications · Software Engineering Research
