A Lightweight Vision-Language Fusion Framework for Predicting App Ratings from User Interfaces and Metadata

Azrin Sultana; Firoz Ahmed

arXiv:2602.20531·cs.CV·February 25, 2026

A Lightweight Vision-Language Fusion Framework for Predicting App Ratings from User Interfaces and Metadata

Azrin Sultana, Firoz Ahmed

PDF

Open Access

TL;DR

This paper introduces a lightweight multimodal framework combining visual UI features and semantic data to accurately predict app ratings, improving over models that use only one data type.

Contribution

It presents a novel fusion of visual and textual features using MobileNetV3, DistilBERT, and a gated fusion module for app rating prediction, optimized for edge deployment.

Findings

01

Achieved low MAE of 0.1060 in rating prediction

02

Demonstrated high correlation coefficient of 0.9251

03

Validated effectiveness through extensive ablation studies

Abstract

App ratings are among the most significant indicators of the quality, usability, and overall user satisfaction of mobile applications. However, existing app rating prediction models are largely limited to textual data or user interface (UI) features, overlooking the importance of jointly leveraging UI and semantic information. To address these limitations, this study proposes a lightweight vision--language framework that integrates both mobile UI and semantic information for app rating prediction. The framework combines MobileNetV3 to extract visual features from UI layouts and DistilBERT to extract textual features. These multimodal features are fused through a gated fusion module with Swish activations, followed by a multilayer perceptron (MLP) regression head. The proposed model is evaluated using mean absolute error (MAE), root mean square error (RMSE), mean squared error (MSE),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPersona Design and Applications · Advanced Malware Detection Techniques · Web Data Mining and Analysis