SampleLLM: Optimizing Tabular Data Synthesis in Recommendations
Jingtong Gao, Zhaocheng Du, Xiaopeng Li, Yichao Wang, Xiangyang Li,, Huifeng Guo, Ruiming Tang, Xiangyu Zhao

TL;DR
SampleLLM is a two-stage framework that improves tabular data synthesis for recommendations by aligning generated data distribution with real data and refining feature relationships, leveraging LLMs and importance sampling.
Contribution
It introduces a novel two-stage approach combining LLMs with distribution alignment and feature importance sampling for enhanced tabular data synthesis in recommender systems.
Findings
Significantly outperforms existing methods on recommendation datasets.
Improves distribution alignment and feature relationship modeling.
Demonstrates effectiveness in online deployment.
Abstract
Tabular data synthesis is crucial in machine learning, yet existing general methods-primarily based on statistical or deep learning models-are highly data-dependent and often fall short in recommender systems. This limitation arises from their difficulty in capturing complex distributions and understanding feature relationships from sparse and limited data, along with their inability to grasp semantic feature relations. Recently, Large Language Models (LLMs) have shown potential in generating synthetic data samples through few-shot learning and semantic understanding. However, they often suffer from inconsistent distribution and lack of diversity due to their inherent distribution disparity with the target dataset. To address these challenges and enhance tabular data synthesis for recommendation tasks, we propose a novel two-stage framework named SampleLLM to improve the quality of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques
