Aug2Search: Enhancing Facebook Marketplace Search with LLM-Generated Synthetic Data Augmentation
Ruijie Xi, He Ba, Hao Yuan, Rishu Agrawal, Yuxin Tian, Ruoyan Kong, Arul Prakash

TL;DR
Aug2Search leverages LLM-generated synthetic data to significantly improve Facebook Marketplace's embedding-based retrieval, enhancing search relevance and diversity with up to 4% ROC_AUC gains.
Contribution
This work introduces a novel multimodal, multitask synthetic data augmentation framework using GenAI for EBR models, demonstrating its effectiveness in a large-scale social commerce setting.
Findings
Synthetic data generated by Llama models is highly coherent and diverse.
Training solely on synthetic data can outperform original data in EBR tasks.
Up to 4% ROC_AUC improvement achieved with synthetic data augmentation.
Abstract
Embedding-Based Retrieval (EBR) is an important technique in modern search engines, enabling semantic match between search queries and relevant results. However, search logging data on platforms like Facebook Marketplace lacks the diversity and details needed for effective EBR model training, limiting the models' ability to capture nuanced search patterns. To address this challenge, we propose Aug2Search, an EBR-based framework leveraging synthetic data generated by Generative AI (GenAI) models, in a multimodal and multitask approach to optimize query-product relevance. This paper investigates the capabilities of GenAI, particularly Large Language Models (LLMs), in generating high-quality synthetic data, and analyzing its impact on enhancing EBR models. We conducted experiments using eight Llama models and 100 million data points from Facebook Marketplace logs. Our synthetic data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Expert finding and Q&A systems · Big Data and Digital Economy
MethodsLLaMA
