Effects of Using Synthetic Data on Deep Recommender Models' Performance
Fatih Cihan Taskin, Ilknur Akcay, Muhammed Pesen, Said Aldemir, Ipek, Iraz Esin, Furkan Durmus

TL;DR
This paper explores how synthetic data generation can mitigate data imbalance issues in recommender systems, leading to improved performance by enhancing data diversity and reducing bias.
Contribution
It introduces and evaluates six synthetic data generation methods to improve recommender system accuracy under data imbalance conditions.
Findings
Synthetic negative samples improve AUC scores.
Data augmentation reduces bias towards popular items.
Synthetic data helps address data sparsity.
Abstract
Recommender systems are essential for enhancing user experiences by suggesting items based on individual preferences. However, these systems frequently face the challenge of data imbalance, characterized by a predominance of negative interactions over positive ones. This imbalance can result in biased recommendations favoring popular items. This study investigates the effectiveness of synthetic data generation in addressing data imbalances within recommender systems. Six different methods were used to generate synthetic data. Our experimental approach involved generating synthetic data using these methods and integrating the generated samples into the original dataset. Our results show that the inclusion of generated negative samples consistently improves the Area Under the Curve (AUC) scores. The significant impact of synthetic negative samples highlights the potential of data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques
