Synthetic dataset generation methodology for Recommender Systems using   statistical sampling methods, a Multinomial Logit model, and a Fuzzy   Inference System

Vitor T. Camacho

arXiv:2212.14350·stat.AP·January 2, 2023

Synthetic dataset generation methodology for Recommender Systems using statistical sampling methods, a Multinomial Logit model, and a Fuzzy Inference System

Vitor T. Camacho

PDF

Open Access

TL;DR

This paper presents a comprehensive methodology for generating synthetic datasets for recommender systems, combining statistical sampling, a Multinomial Logit model, and fuzzy inference to simulate realistic user-item interactions.

Contribution

It introduces a novel approach that integrates multiple statistical and fuzzy logic techniques to create customizable synthetic datasets for recommender system research.

Findings

01

Enables generation of datasets with diverse feature types

02

Simulates realistic user behavior and ratings

03

Facilitates research without real data constraints

Abstract

It is said that we live in the age of data, and that data is ubiquitous and readily available if one has the tools to harness it. That may well be true, but so is the opposite. It is ever more common to try to start a data science project only to find oneself without quality data. Be it due to just not having collected the needed features, or due to insufficient data, or even legality issues, the list goes on. When this happens, either the project is prematurely abandoned, or similar datasets are searched for and used. However, finding a dataset that answers your needs in terms of features, type of ratings, etc., may not be an easy task, this is particularly the case for recommender systems. In this work, a methodology for the generation of synthetic datasets for recommender systems is presented, thus allowing to overcome the obstacle of not having quality data in sufficient amount…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Systems and Machine Learning · Customer churn and segmentation