Generating Diverse Synthetic Datasets for Evaluation of Real-life   Recommender Systems

Miha Malen\v{s}ek; Bla\v{z} \v{S}krlj; Bla\v{z} Mramor; Jure; Dem\v{s}ar

arXiv:2412.06809·cs.IR·December 11, 2024

Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems

Miha Malen\v{s}ek, Bla\v{z} \v{S}krlj, Bla\v{z} Mramor, Jure, Dem\v{s}ar

PDF

Open Access

TL;DR

This paper introduces a flexible, modular framework for generating diverse, high-quality synthetic datasets tailored for evaluating real-life recommender systems, addressing limitations of existing methods.

Contribution

The authors present a novel, open-source Python framework that enables controlled, customizable synthetic dataset generation for recommender system research.

Findings

01

Framework effectively isolates model behavior in diverse scenarios

02

Enables benchmarking and bias detection in recommender systems

03

Supports iterative modifications for specific experimental needs

Abstract

Synthetic datasets are important for evaluating and testing machine learning models. When evaluating real-life recommender systems, high-dimensional categorical (and sparse) datasets are often considered. Unfortunately, there are not many solutions that would allow generation of artificial datasets with such characteristics. For that purpose, we developed a novel framework for generating synthetic datasets that are diverse and statistically coherent. Our framework allows for creation of datasets with controlled attributes, enabling iterative modifications to fit specific experimental needs, such as introducing complex feature interactions, feature cardinality, or specific distributions. We demonstrate the framework's utility through use cases such as benchmarking probabilistic counting algorithms, detecting algorithmic bias, and simulating AutoML searches. Unlike existing methods that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques

MethodsFocus