Enhancing and Scaling Search Query Datasets for Recommendation Systems

Aaron Rodrigues; Mahmood Hegazy; Azzam Naeem

arXiv:2505.11176·cs.IR·August 25, 2025

Enhancing and Scaling Search Query Datasets for Recommendation Systems

Aaron Rodrigues, Mahmood Hegazy, Azzam Naeem

PDF

TL;DR

This paper introduces a scalable, data-centric system for improving search query datasets in banking recommendation systems, enhancing intent clarity and addressing cold start issues to boost recommendation accuracy.

Contribution

It presents an integrated system with synthetic query generation, intent disambiguation, and gap analysis, demonstrating significant improvements in dataset quality and recommendation precision.

Findings

01

Synthetic data shows comparable performance on Clinc150 and significant benefits on Banking77.

02

Intent disambiguation achieves an F1 score of 0.863, improving intent clarity.

03

Intent gap analysis recovers up to 71% of latent customer needs.

Abstract

This paper presents a deployed, production-grade system designed to enhance and scale search query datasets for intent-based recommendation systems in digital banking. In real-world environments, the growing volume and complexity of user intents create substantial challenges for data management, resulting in suboptimal recommendations and delayed product onboarding. To overcome these challenges, our approach shifts the focus from model-centric enhancements to automated, data-centric strategies. The proposed system integrates three core modules: Synthetic Query Generation, Intent Disambiguation, and Intent Gap Analysis. Synthetic Query Generation produces diverse and realistic user queries. Our experiments reveal no statistically significant difference when using synthetic data for Clinc150, while Banking77 and a proprietary dataset show significant differences. We dig into the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.