Evaluating LLM-Based Mobile App Recommendations: An Empirical Study

Quim Motger; Xavier Franch; Vincenzo Gervasi; Jordi Marco

arXiv:2510.18364·cs.IR·October 22, 2025

Evaluating LLM-Based Mobile App Recommendations: An Empirical Study

Quim Motger, Xavier Franch, Vincenzo Gervasi, Jordi Marco

PDF

Open Access

TL;DR

This empirical study investigates how large language models recommend mobile apps, analyzing their ranking criteria, consistency, and responsiveness to instructions, revealing fragmented criteria and complex reasoning dynamics.

Contribution

The paper introduces a taxonomy of ranking criteria, a systematic evaluation framework, and a replication package for analyzing LLM-based app recommendations.

Findings

01

LLMs rely on fragmented ranking criteria only partially aligned with ASO metrics.

02

Top-ranked apps are consistent across runs, but variability increases with depth and specificity.

03

LLMs show varying sensitivity to explicit instructions, indicating complex reasoning dynamics.

Abstract

Large Language Models (LLMs) are increasingly used to recommend mobile applications through natural language prompts, offering a flexible alternative to keyword-based app store search. Yet, the reasoning behind these recommendations remains opaque, raising questions about their consistency, explainability, and alignment with traditional App Store Optimization (ASO) metrics. In this paper, we present an empirical analysis of how widely-used general purpose LLMs generate, justify, and rank mobile app recommendations. Our contributions are: (i) a taxonomy of 16 generalizable ranking criteria elicited from LLM outputs; (ii) a systematic evaluation framework to analyse recommendation consistency and responsiveness to explicit ranking instructions; and (iii) a replication package to support reproducibility and future research on AI-based recommendation systems. Our findings reveal that LLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Health and mHealth Applications · Spreadsheets and End-User Computing · AI in Service Interactions