LaQual: A Novel Framework for Automated Evaluation of LLM App Quality
Yan Wang, Xinyi Hou, Yanjie Zhao, Weiguo Lin, Haoyu Wang, Junjun Si

TL;DR
LaQual is an automated, scenario-adaptive framework that evaluates LLM app quality by combining static metrics and dynamic, scenario-specific assessments, improving app ranking and user decision confidence.
Contribution
This paper introduces LaQual, a novel framework that automates LLM app quality evaluation using hierarchical classification, static indicators, and dynamic scenario-adaptive metrics, outperforming baseline methods.
Findings
High correlation with human judgments (rho > 0.60)
Reduces candidate apps by up to 81.3%
Outperforms baselines in decision confidence and efficiency
Abstract
LLM app stores are quickly emerging as platforms that gather a wide range of intelligent applications based on LLMs, giving users many choices for content creation, coding support, education, and more. However, the current methods for ranking and recommending apps in these stores mostly rely on static metrics like user activity and favorites, which makes it hard for users to efficiently find high-quality apps. To address these challenges, we propose LaQual, an automated framework for evaluating the quality of LLM apps. LaQual consists of three main stages: first, it labels and classifies LLM apps in a hierarchical way to accurately match them to different scenarios; second, it uses static indicators, such as time-weighted user engagement and functional capability metrics, to filter out low-quality apps; and third, it conducts a dynamic, scenario-adaptive evaluation, where the LLM itself…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
