Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

Evangelia Christakopoulou; Vivekkumar Patel; Hemanth Velaga; Sandip Gaikwad; Sean Suchter; Venkat Sundaranatha

arXiv:2602.23234·cs.IR·March 10, 2026

Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

Evangelia Christakopoulou, Vivekkumar Patel, Hemanth Velaga, Sandip Gaikwad, Sean Suchter, Venkat Sundaranatha

PDF

Open Access

TL;DR

This paper enhances app store search relevance by generating large-scale textual relevance labels using fine-tuned LLMs, improving ranking performance and user conversion rates, especially for tail queries.

Contribution

It introduces a method to generate high-quality textual relevance labels with LLMs, addressing data scarcity and improving search ranking effectiveness.

Findings

01

Offline NDCG improved for both behavioral and textual relevance

02

A/B testing showed a +0.24% increase in conversion rate

03

Significant gains in tail query performance

Abstract

Large-scale commercial search systems optimize for relevance to drive successful sessions that help users find what they are looking for. To maximize relevance, we leverage two complementary objectives: behavioral relevance (results users tend to click or download) and textual relevance (a result's semantic fit to the query). A persistent challenge is the scarcity of expert-provided textual relevance labels relative to abundant behavioral relevance labels. We first address this by systematically evaluating LLM configurations, finding that a specialized, fine-tuned model significantly outperforms a much larger pre-trained one in providing highly relevant labels. Using this optimal model as a force multiplier, we generate millions of textual relevance labels to overcome the data scarcity. We show that augmenting our production ranker with these textual relevance labels leads to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Text and Document Classification Technologies · Expert finding and Q&A systems